It is the policy of the State Board of Education and a priority of the Oregon Department of Education that there will be no discrimination or harassment on the grounds of race, color, religion, sex, sexual orientation, national origin, age or disability in any educational programs, activities or employment. Persons having questions about equal opportunity and nondiscrimination should contact the Deputy Superintendent of Public Instruction with the Oregon Department of Education.
This technical report is one of a series that describes the development of Oregon’s Statewide Assessment System. The complete set of volumes provides comprehensive documentation of the development, procedures, technical adequacy, and results of the system.
This document provides updated technical adequacy documentation for the Oregon Extended Assessment (ORExt), which is Oregon’s alternate assessment based on alternate academic achievement standards (AA-AAAS). The documentation includes test design and development, technical characteristics of the assessments, and their uses, and impact in providing proficiency data on grade level state standards as part of the mandates from the Every Student Succeeds Act of 2015 (ESSA).
The ORExt assessments were redesigned in 2014-15, including a vertical scale in Grades 3-8 in English language arts and mathematics to support eventual determinations of student growth over time. The test is aligned to Essentialized Standards (EsSt) that are part of comprehensive Essentialized Assessment Frameworks (EAFs) that were written at three levels of complexity (low, medium, and high). The EsSt have been linked to grade level content and expectations, but systematically reduced in terms of depth, breadth, and complexity (RDBC). All ORExt items employed in the 2016-17 ORExt administration were developed in 2014-15. An alignment study was conducted at that time and it was determined that all items were aligned to the new EsSt.
A statewide sample of Oregon general and special education teachers have reviewed all test items for: 1) alignment to the EAFs, 2) accessibility for students with significant cognitive disabilities, 3) sensitivity, and 4) bias. All operational items met the established criteria. In addition, Achievement Level Descriptors (ALDs) were also reviewed for alignment to the EsSt. See Sections 1.1, 1.2, 6.1, and 6.3 for additional information related to the comprehensive grade level standards to EsSt linkage, as well as alignment of items to the EsSt.
The ORExt test design supports student access, including access to read aloud for directions and prompts, presentation of one item per page, and items designed at three levels of complexity where the low level complexity items include graphic and/or object support. For assessors, the scoring process has also been simplified, with answers being either correct (1) or incorrect (0). Partial credit is no longer part of the scoring metric for the ORExt. In addition, the one item per page format not only increases student ability to focus attention, but also reduces the burden on assessors to mask items that are not being tested. The field appears to have been appreciative of the redesign, particularly the Essentialized Standards and new access and efficiency features.
In addition to developing and reviewing/editing over 5,000 new items, conducting an operational field test, and developing a vertical scale, the development of a new ORExt required that new Alternate Academic Achievement Standards (AAAAS) be developed and approved. Comprehensive Standard Setting meetings were conducted on June 15-17, 2015, which were then approved by the Oregon State Board of Education on June 25, 2015, including new achievement level descriptors (ALDs) and cut scores for the assessments. Comprehensive Annual Measureable Objective (AMO) reports were finalized on July 10, 2015.
Though an alignment study was conducted in the fall of 2014 as described above, Non-Regulatory Guidance from the U.S. Department of Education, published on September 25, 2015, included an expectation that all alignment studies must be independent (see Critical Element 3.1). An independent contractor, Dr. Dianna Carrizales, was therefore hired to perform an additional alignment study in the spring of 2017. Complete results are presented in this technical report (see Section 3.1A-B & 3.2).
In addition to the independent alignment study, a pilot tablet administration study was also conducted in the 2016-17 school year. This is the second phase of a three-year plan to make tablet administration of the ORExt available by the 2017-18 school year. A summary of the pilot tablet administration study is provided in Section 2.3C. Complete results from the pilot tablet administration study, phase two, are presented in Appendix 2.3C.
The independent alignment study and pilot tablet administration study are part of our five- year technical documentation plan. Future plans include an inter-rater reliability study, as well as analyses of the impact of accommodations.
The Oregon State Board of Education (SBE) adopted new, challenging academic content standards, the Common Core State Standards (CCSS), in English language arts and mathematics in Grades K-12 on October 28, 2010. These CCSS are utilized for all students in Oregon’s public schools. Oregon was actively involved in the development of the CCSS, as the Oregon Department of Education (ODE), the Educational Enterprise Steering Committee (EESC), Oregon’s Education Service Districts, and school district representatives provided feedback on the draft CCSS standards.
Similarly, the SBE adopted the Next Generation Science Standards (NGSS) on March 6, 2014. The NGSS establish learning targets for all students in Oregon’s public schools in Grades K-12. The ODE and the Oregon Science Content and Assessment Panel provided direct feedback related to the NGSS. The NGSS are being phased in over time instructionally, so students are being assessed relative to the Oregon Science (ORSci) standards that were adopted in 2009. State content Standards The tables below provide examples of essentialized standards in grades 5, 8, & 11 in the subject areas of English language arts (ELA), mathematics, and science. Complete EAF spreadsheets are available at the link provided here, as well. In the right column are designations for estimated difficulty of an item: L (low), M (medium), and H (high). [Complete EAF Spreadsheets] (http://www.brtprojects.org/publications/training-modules)
See Appendix 1.1 for a User Guide that explains the development process and intended uses for the EAFs.
The CCSS, ORSci, and NGSS define what students in Oregon should know and be able to do by the time they graduate from high school. These CCSS, which were developed by national stakeholders and education experts, have been determined to be coherent and rigorous by researchers at the Fordham Institute (see Appendix 1.2). They were also developed with wide stakeholder involvement, particularly here in Oregon. The new ORExt is linked directly to the content in the CCSS in English language arts (reading, writing, & language) and mathematics. The ORExt is dually linked to the ORSci as well as the NGSS. The NGSS are widely accepted by most relevant science instruction organizations as reflective of rigorous and coherent science concepts NGSS Link
The new Essentialized Assessment Frameworks (EAFs) are publicly available at the link on page 16 above in the Essentialized Assessment Frameworks section. A User Guide is provided to instruct educators regarding the intended uses of the Essentialized Standards (EsSt), including the development of Present Levels of Academic Achievement and Functional Performance (PLAAFP) and Individualized Education Program (IEP) goals and objectives. The basic essentialization process employed to generate essentialized standards and write aligned items for the ORExt is outlined below. The process can also be used to support the development of curricular and instructional materials, founded in research-based pedagogy.
The ORExt assessments were administered in the 2017-18 school year in ELA and math in Grades 3-8 and once in Grade 11; science is assessed in Grades 5, 8, & 11. This assessment plan meets the requirements for grade level assessment in Grades 3-8 and once in high school (Grades 10-12) for ELA and mathematics, while science is assessed once in the 3-5 grade band, once in the 6-9 grade band, and once in the 10-12 grade band:
| Content Area | Grade 3 | Grade 4 | Grade 5 | Grade 7 | Grade 8 | Grade 11 |
|---|---|---|---|---|---|---|
| English Language Arts | X | X | X | X | X | X |
| Mathematics | X | X | X | X | X | X |
| Science | X | X | X |
Originally, Oregon statute required that all students participate in statewide assessments, with exceptions allowed for district-approved parent request for assessment waivers (parent opt-out requests) related to student disability or religious beliefs (see Oregon Administrative Rule, OAR § 581-022-0612)
Exception of Students with Disabilities from State Assessment Testing (1) For the purposes of this rule a “student with a disability” is a student identified under the Individuals with Disabilities Education Act, consistent with OAR chapter 581, division 015, or a student with a disability under Section 504 of the Rehabilitation Act of 1973. (2) A public agency shall not exempt a student with a disability from participation in the Oregon State Assessment System or any district wide assessments to accommodate the student’s disability unless the parent has requested such an exemption.
However, House Bill 2655 established a Student Bill of Rights on January 1, 2016, which permitted parents or adult students to annually opt-out of Oregon’s statewide summative assessments, pursuant to OAR § 581-022-1910.
The Governor published a memorandum for Superintendents, Principals, and District Test Coordinators related to the change (see Appendix 1.4.1).
The expectation that all students in the assessed grades participate, including students with disabilities, is elaborated clearly and pervasively across all guidance documents. For example in the Oregon Test Administration Manual (TAM), where it states that, “All students enrolled in grades 3-8 and in high school must take the required Oregon Statewide Assessments offered at their enrolled grade, including students re-enrolled in the same grade as in the prior year, unless the student receives a parent-requested exemption…” (see Appendix 1.4.2, p. 96).
English learners are included as appropriate in Oregon’s statewide assessment system. (see Appendix 1.4A.1, pp. 31-33). The Smarter Balanced assessment directions are translated into multiple languages and available via the Oaks portal. OAR 581-022-0620 (2) requires ODE to provide translated OAKS assessments for populations at or above 9% in grades K-12 within three years after the school year in which the language exceeds the threshold (see Appendix 1.4A.2). In addition, the accommodations available to students who participate in the ORExt include translation into the native language, where appropriate (see Appendix 2.3A1, pp. 36-43).
The ORExt is not administered in a native language format, though it can be translated into a student’s home language.
Oregon’s participation data indicate that most students in the tested grade levels are included in our assessment system. The students with disabilities subgroup did not meet minimum participation requirements in 2015-16, the most current data available at the time of this report, in English language arts or mathematics, with rates at 92.3% and 91.5%, respectively. See the table below for a summary of participation. For complete reports, see Complete reports link
Documentation of this requirement is provided within the Annual Performance Report, Indicator B3, which is submitted to the United States Department of Education’s (USED’s) Office of Special Education Programs (OSEP). Participation and performance summaries are provided below. Additional information regarding state performance is published in the 2015-16 State Report Card (see Appendix 1.5, pages 1-11 for student and teacher demographics and pages 21-48 for assessment information).
The test specifications document that describes our approach to assessment and test design for the ORExt is published in Appendix 2.1. The document includes our approach to reducing the depth, breadth, and complexity (RDBC) of grade level content standards, an overview of the essentialization process and EAF documents, the planned test design for the ORExt, test development considerations, sample test items, item specifications, and universal tools/designated supports/accommodations. No new items were developed in 2016-17, so the 2014-15 test specifications are the most current available.
The stated purpose of the ORExt is to provide the state technically adequate student performance data to ascertain proficiency on grade level state content standards for students with significant cognitive disabilities. A long-term goal of the program is to also provide information regarding annual student growth related to these content standards over Grades 3-8, as measured by vertically scaled assessments in ELA and mathematics. The results of the assessment are currently reported in comparison to four performance levels: Level 1, Level 2, Level 3, and Level 4. Levels 3 and 4 denote a proficient level of performance, while Levels 1 and 2 denote performance that is not proficient. BRT and ODE developed a scaled score interpretation guide to assist stakeholders in interpreting the meaning of the scaled scores generated by the ORExt, supported by the state’s achievement level descriptors. This guidance is published in Appendix 2.1A.
Appendix 2.1B includes the entire test blueprint for the ORExt, as conveyed by the balance of representation across content areas and domains. Field-testing is conducted each year in order to support the continuous improvement of test functioning. However, items are selected to maintain this balance of representation. Oregon teachers validated the content of the assessment, agreeing with the standards that were and were not selected to develop the Essentialized Standards to which the ORExt test items are aligned.
The test development process implemented for the ORExt is conveyed in Appendix 2.1C, including standard selection and validation, item development, item review, review of all Oregon teacher feedback and updating of items, and scaling and item selection. The Appendix articulates the process used to generate the materials with comma separated value files used to create item templates that fed into Adobe InDesign© through a data merge. Final test packages are reviewed for accuracy and content and then disseminated via secure file transfer to Oregon Qualified Assessors.
The ORExt is not a computer-adaptive instrument, so these concerns do not apply.
Item writers were recruited by ODE staff on May 20, 2014 using an existing Qualified Assessor/Qualified Trainer listserv program, using the following text:
Behavioral Research and Teaching at the University of Oregon is recruiting Oregon teachers to participate in item development for a new alternate assessment, to be field tested in the spring of 2015. Selected teachers will be asked to develop 360 items in English Language Arts, Mathematics, or Science over the course of the summer, from mid-June through end of August 2014. The Project Director will work with lead item developers to provide training, ongoing review and feedback, and quality assurance. All participants will be expected to provide documentation of their qualifications and sign test security agreements. In addition, all item developers will be expected to participate in a half-day item development training based upon the following schedule: " ELA - Tuesday, June 17, 2014 from 8 AM to 12 PM " Math - Wednesday, June 18, 2014 from 8 AM to 12 PM " Science - Thursday, June 19, 2014 from 8 AM to 12 PM
All licensed Oregon public school teachers with at least three years of teaching in a life skills/severe needs program (SPED) or a general education classroom (GEN-ED), respectively, are encouraged to apply. Preference will be given for item writing experience, additional years of teaching experience, and higher education degree status.
Teachers who participate in this process will be compensated at a rate of $20/hr via professional service contracts. It is anticipated that teachers will produce 4 ELA items/hr, 6 Science items/hr, and 8 Math items/hr. As such, the maximum contract amount for ELA will be $1,800, for Science $1,440, and for Math $900. Item development will focus primarily on writing the stem and 3 options, with no need to produce graphics (rather use labels for a BRT graphic designer to produce).
If you meet the above qualifications and are interested in applying to assist on this project, please contact Dan Farley at dfarley@uoregon.edu or at 541-346-3133. The deadline to apply is June 13, 2014. Thank you.
Because the timeline required work over the summer, Oregon teacher recruitment was challenging. BRT researchers thus performed an additional on-campus recruitment within the College of Education using the same information. The final pool of item writers included 18 item writers: seven Oregon teachers (all with MA degrees), five PhD candidates within the COE, and six BRT researchers (four PhD candidates, one PhD, and one with an MA). Item writers averaged 11.5 years of teaching experience. The teachers recruited all had prior experience developing items for the ORExt, as did all of the BRT researchers. The five PhD candidates within the COE had no prior item development experience. All item development was reviewed by BRT researchers and the Project Manager.
The item development process followed is elaborated in Appendix 2.2.1, which is the PowerPoint used in training all Oregon item writers. The item development process was structured with the following steps. Item writers were first oriented to the student population, as the pool of item writers included both content and special education experts. The Essentialization Process used to RDBC grade level standards was then modeled so writers would understand how the item alignment targets, the Essentialized Standards, were generated. Lecture, guided practice, and independent practice activities and follow-up discussion ensured comprehension of the process. BRT staff developed exemplar items for every Essentialized Standard, varying the complexity from Low (L) to Medium (M) to High (H) levels of complexity to convey the different performance expectations at each level. The balanced vertical scaling design provided an overall form-to-form and grade-to-grade level framework for the test formation process once items were developed (see Appendix 2.2.2). Sample items are provided in Appendix 2.2.3 for stakeholder reference, demonstrating the format and style of typical items on the ORExt.
The ORExt assessments are administered according to the administration, scoring, analysis, and reporting criteria established in the ORExt General Administration Manual (see Appendix 2.3). Important updates to the testing process are distributed via the Assessment and Accountability Updates listserve, as well Updates link ODE uses this system to communicate information that is relevant for the statewide assessment system, including the ORExt. Announcements are sent to the listserv by email and are also posted to the ODE website. The standardization of test administration is supported by a comprehensive training process described below in Section 2.3B.
The state has ensured that appropriate universal tools, designated supports, and accommodations are available to students with disabilities and students covered by Section 504 by providing guidance and technical support on accommodations (see Appendices 2.3A.1 and 2.3A.2). Guidelines regarding use of the accommodations for instructional purposes are included in the document, as all students are expected to receive test accommodations that are consistent with instructional accommodations.
Accommodations are built into the flexibility provided by the ORExt test though they have not yet been researched for the ORExt. However, annual training and proficiency testing efforts related to becoming a qualified assessor and/or qualified trainer for the ORExt support standardized use of available accommodations that are not already part of the test design. Based on annual analyses, results demonstrate that student performance varies according to their abilities and not construct-irrelevant factors, such as sex, race, or ethnicity (See Section 4.2).
The state has ensured that appropriate accommodations are available to students with limited English proficiency by providing guidance and technical support on accommodations (see Appendix 2.3A.1). Communication systems for this student population are limited; exposure to multiple languages can make a student’s communication system more complex. The ORExt uses universal design principles and simplified language approaches in order to increase language access to test content for all students. In addition, directions and prompts may be translated/interpreted for students in their native language.
An analysis of accommodated versus non-accommodated administrations is needed in order to demonstrate that the provision of language accommodations is not providing any advantage to students with limited English proficiency, nor any disadvantage to other participants. Accommodations information was collected this year as an option for data entry. Entering accommodations information will be required next year. Analyses of the impact of accommodation provision on the ORExt should thus be feasible after the spring 2018 administration.
The Oregon Extended assessments can be administered using both Large Print and Braille (contracted and non-contracted) versions, as well. Oregon has ensured that the Oregon Extended assessments provide an appropriate variety of accommodations for students with disabilities. The state has provided guidance on accommodations in presentation, response, setting, and timing in the Accommodations Manual 2013-14: How to Select, Administer, and Evaluate Accommodations for Oregon’s Statewide Assessments (see Appendix 2.3A.2). The Oregon Extended assessments are also designed according to universal design principles and utilize a simplified language approach (see Appendix 2.3A.3).
In the 2013-2014 school year, the state developed a training and proficiency program for sign language interpretation of its assessments and has updated the site annually since that time. The training process Training link included videos of interpreters administering items to students, materials that support appropriate administration (i.e., transcripts and PowerPoint slides that supplement the video administrations and the current ODE accommodations manual), and proficiency testing to support standardized interpretation for Oregon’s assessments, including the ORExt. A 10-item proficiency test was administered, with an 80% required for passing (8/10 items correct). In 2016-17, the site was used to train 60 participants. All participants passed the assessment on the first attempt. The overall average score on the proficiency test was 97.6%.
The ORExt assessments provide an appropriate variety of linguistic accommodations for students with limited English proficiency. They also use a simplified language approach in test development in order to reduce language load of all items systematically (see Appendix 2.3A3). Any given student’s communication system may include home signs, school signs, English words, and Spanish words, for example. With the exception of items that require independent reading, the ORExt assessment can be translated or interpreted by a Qualified Assessor (QA) working with an interpreter in the student’s native language, including American Sign Language. QAs are allowed to translate/interpret the test directions. QAs can adapt the assessment to meet the needs of the student, while still maintaining standardization due to systematic prompts and well-defined answers.
Comprehensive information for ongoing training for all qualified assessors (QAs) and Qualified Trainers (QTs) is provided in Appendices 2.3B.1-2.3B.8. Training and QA/QT proficiency is determined annually via an online distribution and assessment system located at QA/QT website link This website hosts all resources and information needed to administer, score, report, and interpret the results from the ORExt. The website also includes proficiency assessments that are required for all QAs and QTs who may administer the ORExt. QTs are directly trained by ODE and BRT staff as part of a train the trainers model. QTs then provide direct trainings for new QAs in their respective regions.
The Oregon Department of Education (ODE) provided four direct statewide trainings for new Qualified Trainers (QTs) and Qualified Assessors (QAs) in face-to-face regional trainings. The schedule for the regional trainings, as well as relevant training information, is provided below:
Only trained Qualified Assessors (QAs) can administer the Oregon Extended assessment. Qualified Assessors who also receive direct instruction from ODE and BRT may become Qualified Trainers (QTs) who are certified to train local staff using the train-the-trainers model. Training for new assessors must be completed on an annual basis. Assessors who do not maintain their respective certifications for any given year must re-train if they choose to enter the system again.
The tables below contain data from the Oregon Extended Assessment Training and Proficiency Website QA/QT website link All assessors need to complete some form of training each year to retain their status for administering the Extended Assessments.
New assessors and returning assessors who needed further training in 2016-17 were required to pass four proficiencies with a score of 80% or higher. These four proficiencies were in Administration, English Language Arts (ELA), Mathematics, and Science. Returning QAs or QTs for the 2016-17 school year only needed to pass a Refresher Proficiency, again with a score of 80% or higher. The tables below contain data on the number of assessors (participants) in each of the four proficiencies, as well as the Refresher Proficiency. Included in the data is the number of attempts needed to attain a passing score as well as the average passing score of the participants.
An analysis of the Oregon Extended Assessment Training and Proficiency Website showed 353 Assessors in-Training, 1,030 Qualified Assessors, and 137 Qualified Trainers.
A higher number of assessors completed the Refresher Proficiency test than the subject area proficiency tests reflecting a greater number of return assessors compared to new assessors. Administration Proficiency continued to be the most challenging to new assessors, but most were able to pass on the first or second attempt with less than 2% of assessors requiring more than two attempts. The majority of assessors passed the ELA, Math, Science, and the Refresher proficiency tests on the first attempt with less than 4% requiring a second or third attempt. There were 73 fewer Qualified Assessors but 8 more Qualified Trainers compared to last year.
Evaluations are collected at each QT training in November. The results reflect general approval, but also suggest areas of improvement that ODE and BRT work on for subsequent trainings/subsequent years, as appropriate. QT evaluations this year included positively worded statements regarding the quality of training rated on a scale where 1 = Strongly Disagree, 2 = Disagree, 3 = Agree, and 4 = Strongly Agree.
The first section evaluated the state-level information and the knowledge of the ODE presenters, the participants’ level of comfort with the training provided, the participants’ ability to carry this training and materials back to train district staff, and the overall utility of the training. Seventy-eight percent of participants strongly agreed with these statements, 19% agreed, and less than 3% disagreed and strongly disagreed, collectively. In the second section, participants were asked to evaluate the BRT trainers and their guidelines regarding how to use the training and proficiency website and related resources. Seventy-nine percent of participants strongly agreed with these statements, 19% agreed, and less than 2% disagreed and strongly disagreed, collectively. Overall, these results demonstrate that participants felt that the training was high quality and they felt confident that they could train their staff upon return to their respective districts with the knowledge and resources gained. This year’s QT training cycle included an optional afternoon session for any interested educators on how to essentialize grade level content standards and how to develop curriculum and provide instruction that is aligned to those standards for students who are functioning off grade level, with a focus on students with significant cognitive disabilities (SWSCD). We asked participants to rate their confidence in using the knowledge acquired during the session as well as to evaluate the quality of the presentation and materials. A four-point scale was employed (Strongly Disagree, Disagree, Agree, Strongly Agree). The survey for the afternoon session was conducted online with Qualtrics software. Percentages of responses for each statement used in the survey are provided below. The first table provides a summary of the data related to participant confidence, while the second provides their evaluation of the quality of the presentation. The respondent n-sizes ranged from 26-30, depending upon the question. A bar graph of study results is provided below, followed by tables of descriptive statistics. The data visualization below was conducted with ggplot in the tidyverse package (Wickham, H., 2017).
A bar graph of study results here:
Note: The first two graphs present participants’ confidence in their ability to use the information presented in the areas of essentialization and curriculum and assessment, respectively. The next four graphs convey evaluations of the presenters for the curricular and instruction and essentialization trainings. Results are very positive, with some reviewers feeling less confident about their abilities to train others about the essentialization process. This outcome was expected. The process is complex, particularly given the understanding that this was the first time they had received such training.
Confidence Scale Percentages table here:
In addition, all technical assistance questions that we receive from the field as part of our HelpDesk are documented. The log of the technical assistance provision is reviewed each month, as well as annually, in order to determine what aspects of our assessment system need further clarification or improvement. The HelpDesk log is published in Appendix 2.3B.9.
Oregon monitors the quality of its system in several ways in order to support continuous improvement. In terms of the assessment quality, item statistics are reviewed each year and items that are not functioning as intended are removed and replaced by better functioning field-test items.
In 2014-15, items were reviewed in two phases, first using classical test theory (CTT) and second using Rasch analyses. All items flagged as a result of the statistical reviews were analyzed, item-by-item, by a team of measurement and content experts at BRT. Not all flagged items were removed, as several did not have apparent design flaws. Considerations regarding domain representation as well as item difficulty range also were considered during the review process. We also employed different decision rules for unique items versus horizontally- or vertically-scaled anchor items. It was important in many cases to maintain anchor items. Items with clear design flaws were removed from subsequent analyses and reporting. The following flagging criteria were employed:
Out of a total of 5,929 items developed in 2014-15, 166 were removed (2.8%).
We also implement a consequential validity study each year that surveys QAs and QTs regarding the academic and social consequences of the ORExt, both intended and unintended. The Consequential Validity report is published in Appendix 2.3B.10. ODE and BRT staff review the results of the survey annually to determine what program improvements are needed. A summary of the results is provided below.
ODE implemented a research survey program to address the need to document the consequences, both intended and unintended, of the ORExt Assessments. The research questions have been framed based upon current consequential validity approaches for alternate assessments in the literature, as well as issues that are of specific value in Oregon. The survey included 344 respondents. This was 25% of the solicited respondents, who were all Qualified Assessors (QAs) and Qualified Trainers (QTs) in the or.k12test.com database. The sample was 84% female and represented all regions of the state, as well as age ranges. The survey included a range of quantitative and qualitative components. The quantitative results demonstrate that QAs and QTs continue to feel that the ORExt test items were easy to administer and score (58% Strongly Agree) and felt confident in their ability to interpret scaled scores and Achievement Level Descriptors for the ORExt (54% Agree). They also felt that the items were accessible for students who participated (53% Agree) and that the ORExt reflected the academic content that SWSCD should be learning (57% Agree). QAs and QTs felt marginally positive about the educational impacts of the ORExt and marginally negative about its social impacts. The results again demonstrate that the ORExt content area assessments generally require up to one hour to administer.
The qualitative results revealed two areas in which educators appreciated the ORExt and four areas of needed improvement. QAs and QTs said that they appreciated: 1) the assessment’s efficiency (i.e., more streamlined administration, ease of administration, easier to give and score online, online materials distribution); and, 2) overall item and test design (i.e., one item per page, visual supports, scoring protocol and student materials design, accessibility of test questions). Teachers recommended the following areas of improvement, not all of which are actionable: 1) Option to administer the assessment electronically, 2) A functional skills assessment, 3) New items for very low functioning students should be developed, and 4) A math assessment composed of more practical/life skills problems involving time and money. Complete results, including anticipated responses, from the survey can be found in Appendix 2.3B.10.
The ORExt was implemented on a small scale using a technology-based platform as part of Phase 2 of the ORExt Pilot Tablet Administration study conducted in the spring 2017. The current plan is to make the tablet-based administration of the ORExt available statewide next year, 2017-18. A complete report of the results of the study, including the research plan and the lessons learned, is provided in Appendix 2.3C. In short, this year’s tablet administration study demonstrated that QAs support a tablet administration of the ORExt at the statewide level. The study results also demonstrated that additional training must be provided for the manual writing scoring process. In addition, the administration of the ORora for students whose ORExt testing is discontinued after they have met the minimum participation rule will be incorporated into the tablet administration next year. To support training and understanding of the system by both teachers and students, practice items in a tablet format will be provided for stakeholders to prepare for future tablet-based administrations. This year’s study addressed issues related to database communication systems to ensure data security and accurate data storage and access at the district level. The paper/pencil version will continue to be available for students who cannot access a tablet administration.
The ODE maintains a rigorous training system to support standardized test administration for the ORExt, located at link (secure website, but see screenshot below for an example of training content).
The or.k12test.com website includes a training section that addresses any systems updates, the process for becoming a Qualified Assessor or Qualified Trainer, student eligibility expectations, student confidentiality and test security, test administration and scoring expectations, examples of appropriate and inappropriate administration (video), supporting student access to items without violating the test construct, content area trainings that demonstrate how to administer items in ELA, Math, and Science (video, with supporting test materials), and how to access secure tests and complete data entry. Information for QAs, QTs, and parents regarding the ORExt is also provided, as are all necessary support materials. For QAs, these materials include practice tests to prepare both themselves and students for the annual assessment and all of the training materials used on the website. In addition to these materials, QTs have access to all training materials necessary to provide annual training to QAs in their purview (see screenshot below):
In addition, monitoring and reporting related to test administration issues for the ORExt is addressed via general ODE reporting systems. Information regarding this process can be located in the general assessment system Peer Review evidence submission.
Test security policies and consequences for violation are addressed in the Test Administration Manual on an annual basis (see Appendix 1.4.2, p. 29-33). These policies include test material security, proper test preparation guidelines and administration procedures, consequences for confirmed violations of test security, and annual training requirements at the district and school levels for all individuals involved in test administration. Consequences for adult-initiated test irregularities may be severe, including placing teaching licenses in jeopardy (see Appendix 1.4.2, p. 31-33).
The ODE utilizes a localized monitoring system where school test coordinators oversee building-level administration by trained, Qualified Assessors, and report to centralized district test coordinators, who are then responsible for reporting any confirmed violations to ODE. Improprieties are defined as adult-initiated or student-initiated and investigated accordingly (see Appendix 1.4.2, p. 29-31).
ODE’s alternate assessment program manager investigates and remediates substantiated test security incidents for the ORExt by working with district test coordinators. Additional information regarding this process can be located in the general assessment system Peer Review evidence submission.
School and district test coordinators conduct initial investigations into all alleged test irregularities. Once reported to ODE, all alleged test irregularities are investigated in consultation with district test coordinators and the test vendor, as appropriate (see Appendix 1.4.2, p. 31-33). In the event that a test irregularity is determined to be factual, consequences are determined based upon contextual issues that are brought to light during the investigation. Additional information regarding this process can be located in the general assessment system Peer Review evidence submission.
Test materials for the ORExt are maintained throughout development, dissemination, and administration via multiple mechanisms. All items under development are stored in secure file servers managed by Behavioral Research & Teaching at the University of Oregon, the test vendor for the ORExt. Item reviews necessary to provide alignment, bias, and sensitivity information are conducted online using the secure Distributed Item Review (DIR) platform DIR link (secure website, but see Appendix 3.1B for a system overview).
For the 2016-2017 school year, all secure test distribution and data entry was hosted by ODE’s secure file transfer system, which is a password-protected test distribution and data entry system located at Secure file link (secure website, but see screenshot below for reference). A data entry guide is provided in Appendix 2.6.
Additional information regarding test security can be located in the general assessment system Peer Review evidence submission.
Student level data is protected by relevant training and through a secure data system in which all data entry is conducted online using password-protected, secure procedures on the Oregon K12 website or Secure data link websites, as identified above. Only trained users with a vested educational interest who have signed test security agreements are authorized to access to online data entry systems. See Appendix 2.6 for additional data entry expectations for 2016-17.
All confidential, personally identifiable student information is protected by policy and supported by training (see Appendix 1.4.2, p. 26). The minimum number of students necessary to allow reporting of students and student subgroups varies by rating (i.e., achievement, growth, graduation, and school size) by level (i.e., school/district/state), and by number of years of assessment data available. For example, to receive an achievement rating, schools must have at least 40 tests for the two most recent school years in reading or mathematics. Alternatively, small schools receive an achievement rating if they have at least 40 tests over the most recent four years. If a school does not have at least 40 tests over a four-year period, they will not receive an achievement score (see Appendix 2.6C). Similar rules are applied to student subgroups, including students with disabilities, English learners, and students from diverse racial/ethnic backgrounds (see Appendix 2.6C, p. 7).
As elaborated by Messick (1989) , the validity argument involves a claim with evidence evaluated to make a judgment. Three essential components of assessment systems are necessary: (a) constructs (what to measure), (b) the assessment instruments and processes (approaches to measurement), and (c) use of the test results (for specific populations). Validation is a judgment call on the degree to which each of these components is clearly defined and adequately implemented.
Validity is a unitary concept with multifaceted processes of reasoning about a desired interpretation of test scores and subsequent uses of these test scores. In this process, we want answers for two important questions. Regardless of whether the students tested have disabilities, the questions are identical: (1) How valid is our interpretation of a student’s test score? and (2) How valid is it to use these scores in an accountability system? Validity evidence may be documented at both the item and total test levels. We use the Standards (AERA et al., 2014) in documenting evidence on content coverage, response processes, internal structure, and relations to other variables. This document follows the essential data requirements of the federal government as needed in the peer review process. The critical elements highlighted in Section 4 in that document (with examples of acceptable evidence) include (a) academic content standards, (b) academic achievement standards, (c) a statewide assessment system, (d) reliability, (e) validity, and (f) other dimensions of technical quality.
In this technical report, data are presented to support the claim that Oregon’s AA-AAAS provides the state technically adequate student performance data to ascertain proficiency on grade level state content standards for students with significant cognitive disabilities - which is its defined purpose. The AA-AAAS are linked to grade level academic content, generate reliable outcomes at the test level, include all students, have a cogent internal structure, and fit within a network of relations within and across various dimensions of content related to and relevant for making proficiency decisions. Sample items that convey the design and sample content of ORExt items are provided in Appendix 2.2.3.
The assessments are administered and scored in a standardized manner. Assessors who administer the ORExt are trained to provide the necessary level of support for appropriate test administration on an item-by-item basis. There are four levels of support outlined in training: full physical support, partial physical support, prompted support, and no support. Items were designed to document students’ skill and knowledge on grade level academic content standards, with the level of support provided designed not to interfere with the construct being measured. Only one test administration type is used for the ORExt, patterned after the former Scaffold version of the assessment. Assessors administer the prompt and if the student does not respond, the Assessor reads a directive statement designed to focus the student’s attention upon the test item and then repeats the prompt. If the student still does not respond, the Assessor repeats the prompt as needed and otherwise scores the item as incorrect and moves on to the next item. Training documentation is provided in Appendices 2.3B.1-2.3B.8.
Given the content-related evidence that we present related to test development, alignment, training, administration, scoring, the reliability information reflected by adequate coefficients for tests, and, finally, the relation of tests across subject areas (providing criterion-related evidence), we conclude that the alternate assessment judged against alternate achievement standards allows valid inferences to be made on state accountability proficiency standards.
Our foundation of validity evidence from content coverage for the ORExt comes in the form of test specifications (see Appendix 2.1) and test blueprints (see Appendix 2.1B). Among other things, the Standards (AERA et al., 2014)2 suggest specifications should “define the content of the test, the proposed test length, the item formats…” (Standard 4.2, p. 85).^2
All items are linked to grade level standards and a prototype was developed using principles of universal design with traditional, content-referenced multiple-choice item writing techniques . The most important component in these initial steps addressed language complexity and access to students using both receptive, as well as expressive, communication. Additionally, both content breadth and depth were addressed. We developed one test form for the ORExt that utilizes a scaffold approach. This approach allows for students with very limited attention to access test content, while the supports are not utilized for students who do not need this support.
We developed the test iteratively by developing items (see Appendix 2.2.1, which conveys our item writer training materials), piloting them, reviewing them, and editing successive drafts. We used a combination of existing panels of veteran teachers who have worked with the Oregon Department of Education (ODE) in various advising roles on testing content in general and special education, using the same processes and criteria, as well as the introduction of newer teachers who are qualified as we proceed to remain relevant. Behavioral Research and Teaching (BRT) personnel conducted the internal reviews of content. After the internal development of prototype items, all reviews then involved Oregon content and special education experts with significant training and K-12 classroom experience.
The ORExt incorporates continuous improvement into its test design via field-testing in all content areas on an annual basis, with an average of 25% new items. These items are compared to operational items based on item functioning and test design factors, generating data used to replace items on an annual basis, incorporating the new items that fill a needed gap with regard to categorical concurrence, or provide for a wider range of functioning with regard to complexity levels: low - medium - high, comparable to Webb’s DOK (see Section 3.1A).
BRT employed a multi-stage development process in 2014-15 to ensure that test items were linked to relevant content standards, were accessible for students with significant cognitive disabilities, and that any perceived item biases were eliminated. The item review process included 51 reviewers with an average of 22 years of experience in education. The ORExt assessments have been determined to demonstrate strong linkage to grade level academic content, overall. Full documentation of the initial 2014 linkage study and a new, independent alignment study conducted in spring, 2017 is provided in Appendix 3.1A. No item development was required in 2016-17.
The summary section of the independent alignment study report states that, “Oregon’s Extended Assessments (ORExt) in English Language Arts, Mathematics, and Science were evaluated in a low-complexity alignment study conducted in Spring of 2017. Averages of reviewer professional judgments over five separate evaluations were gathered, reviewed, and interpreted in the pages that follow. In the three evaluations that involved determining the relationship between standards and items, reviewers identified sufficient to strong relationships among assessment components in all grades and all subject areas. In the two evaluations involving Achievement Level Descriptors, reviewers identified thirty instances of sufficient to strong relationships out of thirty-four possible relationship opportunities resulting in an overall affirmed relationship with areas for refinements identified.”
Because the assessments demonstrate sufficient to strong linkage to Oregon’s general education content standards and descriptive statistics demonstrate that each content area assessment is functioning as intended, it is appropriate to deduce that these standards define the expectations that are being measured by the Oregon Extended assessments.
The Oregon Extended assessments yield scores that reflect the full range of achievement implied by Oregon’s alternate achievement standards. Evidence of this claim is found in the standard setting documentation submitted in Section 6.2. Standards were set for all subject areas on June 15-17, 2015. Standards included achievement level descriptors and cut scores, which define Oregon’s new alternate achievement standards (AAS). The State Board of Education officially adopted the AAS on June 25, 2015.
Complete results of the analysis of the linkage of the new Essentialized Assessment Frameworks, (EAF), composed of Essentialized Standards (EsSt), to grade level CCSS in English language arts and mathematics and linked to ORSci and NGSS in science, are presented in Section 3.1A. The claim is that the EsSt are sufficiently linked to grade level standards, while the ORExt items are aligned to the EsSt. In addition to presenting linkage information between grade level content standards and the EsSt, the linkage study presents alignment information related to the items on the new ORExt in comparison to the EsSt. Extended assessments have been determined to link sufficiently to grade level academic content standards. Field test items are added each year based on item alignment to standards.
The Oregon Extended assessments link to grade level academic content, as reflected in the item development process. Oregon also had each operational item used on the Oregon Extended assessment evaluated for alignment as part of two comprehensive linkage studies, one performed in 2014 and an independent alignment study performed in 2017 (see Section 3.1A). The professional reviewers in an internal study in 2014 and an independent study in spring 2017 included both special and general education experts, with content knowledge and experience in addition to special education expertise.
According to the independent linkage study report (see Appendix 3.1A), the spring 2017 review was conducted by expert reviewers with professional backgrounds in either Special Education (the population), Assessment, or in Oregon’s adopted content standards. Reviewers were assigned to review grade-level items relative to their experience and expertise. In all, 39 reviewers participated. Thirty-four (34) participated in all 5 evaluations: thirteen (13), for the English Language Arts review, fifteen (15) for the Mathematics review, and six (6) for the Science review. All participants were assigned to at least one specific content area as shown in Table 1. Note: Four individuals were assigned to two areas of review. The thirty-nine individuals who participated in the study had a robust legacy of experience in the field and in the state. Participants represented 25 unique school districts across the state representing both urban and rural perspectives. All 39 of the individuals participating in the study held current teaching licenses. Two individuals also held administrative licenses. Years of experience in their area ranged from 3 - 30 years of experience with an average of 17 years of experience. (Mode = 11 years, Median = 16 years). One individual indicated 50 years of experience in the field. Three of the 39 individuals held a Bachelor’s degree only. Thirty-six held a Bachelor’s degree and at least one Master’s degree. Two held a Bachelor’s degree, at least one Master’s degree, and a doctoral degree. Fourteen (36%) of the individuals identified as experts in a specific Content area and 25 (64%) of the individuals identified Special education as their primary area of expertise.
These skilled reviewers were trained by synchronous webinars on linkage/alignment, as well as item depth, breadth, and complexity and then completed their ratings online via BRT’s Distributed Item Review (DIR) website and on Excel spreadsheets shared with the researcher electronically (see Appendix 3.1B for an overview). Mock linkage ratings were conducted in order to address questions and ensure appropriate calibration. Reviewers rated each essentialized standard on a 3-point scale (0 = no link, 1= sufficient link, 2= strong link) as it related to the standard the test developers had defined for that essentialized standard. Items were evaluated, in turn, based upon their alignment to the essentialized standard on a 3-point scale (0 = insufficient alignment, 1 = sufficient alignment, 2 = strong alignment). When averaged across reviewers, 1.00-1.29 was considered in the low range, 1.30 - 1.69 was sufficient, and 1.70 - 2.0 was strong. Additional comment was requested for any essentialized standard or item whose linkage was rated 0.
Overall, the 2017 independent alignment study concludes that: “First, reviewers were asked to conduct an affirmational review of the rationale used by test developers to omit certain content standards. This finding was used to infer that the final standards selected for inclusion or omission in Oregon’s Extended Assessment were chosen rationally and that the final scope of content standards can be considered justifiable for the population for the subject area. Conclusion: This review, with a lowest average rate of .82 (on a scale of 1), permits the inference: the scope of the standards selected for translation to Essentialized Standards were rationally selected. None of the standards de-selected (for inaccessibility or for being covered elsewhere) were strongly identified for re- inclusion, nor were identified as a critical hole for this population of students. Second, reviewers were asked to identify the strength of the link between the source standard and the Essentialized Standard. This finding was used to infer that the process undertaken to essentialize a given Source Standard did not fundamentally or critically alter the knowledge or skill set intended by the source standard for this population of students (further confirming that the content selected for assessment is comparable). Conclusion:This review, with a range of 1.5 - 1.9 (on a scale of 2) permits the inference: the Essentialized Standards were found to link sufficiently to the source standards on average beyond the”sufficient" average of 1.0. Third, reviewers were asked to identify the strength of the alignment between the Essentialized Standards and the items and to review the items developed using the Essentialized Standards for bias, and accessibility. The finding from this review was used to infer that the items written for this grade and subject area (using these Essentialized Standards) were adequately linked to the Essentialized Standards, were free from bias, and were accessible to students with significant cognitive disabilities. Conclusion: The alignment review (1.32 - 1.89), accessibility review (.67 - 1.0), and freedom from bias review (.65 - 1.0) all permit the inference that the test items indicate a relationship with the source standards, the test items are not overly biased towards or against any particular group of individuals, and the test items are written such that the content and intent can be accessed by students with the most significant cognitive disabilities. (**Note: this range was skewed by feedback from one reviewer –ELA-Grade 3 - whose comments were noted in this study. Removing that individual’s comments would result in a range of .90 - 1.0 accessibility range and .89 - 1.0 freedom from bias range respectively.) Fourth, reviewers were asked to review the statements used to describe student achievement on the test (the Achievement Level Descriptors) and their alignment to the Essentialized Standards that the students were tested on. The finding from this review was used to infer that the skills and achievements described by the Achievement Level Descriptors for each subject and grade level are aligned with the content standard being measured. Conclusion: The reviews ranging from .68* - 1.0 permit the inference that the descriptions made regarding student skillset are an accurate reflection of the standards from which the assessment was developed at all three levels evaluated. (*One outlier for ELA-Grade 4 provided a review of a .52 average). Fifth, and finally, reviewers were asked to review the alignment of the Achievement Level Descriptors to the items. The finding from this review was used to infer that each item in the developed assessment(s) was appropriately aligned to its associated Achievement Level Descriptor (further confirming that decisions made using this test were aligned with the intent of the source standard). Conclusion: Fourteen of the seventeen grade-level reviews resulted in an average reviewer range of .67 - 1.0 indicating an appropriate alignment between ALDs and the items as written. This review permits the inference that, overall, the Achievement Level Descriptors are accurate reflections of the items. In three instances (Mathematics-Grades 3 and 4, and ELA-Grade 8) the average alignment by reviewer was .5 (indicating that one of the two individuals in that category did not agree that the items and ALDs were aligned)."
Evidence of content coverage is concerned with judgments about “the extent to which the content domain of a test represents the domain defined in the test specifications” (AERA et al., 2014, Standard 4.12, p. 89)7. As a whole, the ORExt is comprised of sets of items that sample student performance on the intended domains. The expectation is that the items cover the full range of intended domains, with a sufficient number of items so that scores credibly represent student knowledge and skills in those areas. Without a sufficient number of items, the potential exists for a validity threat due to construct under-representation (Messick, 1989)6.
The ORExt assessment is built upon a variety of items that address a wide range of performance expectations rooted in the CCSS, NGSS, and ORSci content standards. The challenge built into the test design is based first upon the content within each standard in English language arts, mathematics, and science. That content is RDBC in a manner that is verified by Oregon general and special education teachers to develop assessment targets that are appropriate for students with the most significant cognitive disabilities. Our assessments utilize universal design principles in order to include all students in the assessment process, while effectively challenging the higher performing students. For students who have very limited to no communication and are unable to access even the most accessible items on the ORExt, an Oregon Observational Rating Assessment (ORora) was first implemented in 2015-16. The ORora is completed by teachers and documents the student’s level of communication complexity (expressive and receptive), as well as level of independence in the domains of attention/joint attention and mathematics. A complete report of ORora results from 2016-17 is provided in Appendix 5.1D.
Fifty-one reviewers analyzed all ORExt items for bias, sensitivity, accessibility to the student population, and alignment to the Essentialized Standards. A total of 21 reviewers were involved in the English language arts item reviews. An additional 21 reviewers were involved in the Mathematics item reviews. Science employed nine reviewers. Reviewers were organized into grade level teams of two special educators and one content specialist.
Substantive evidence that has been documented suggests that the ORExt items are tapping the intended cognitive processes and that the items are at the appropriate grade level through the linkage/alignment studies documented above, including reviews of linkage, content coverage, and depth of knowledge. A comprehensive report of the item review process is available in Appendix 3.1A.
The Oregon Extended assessments reflect patterns of emphasis that are supported by Oregon educators as indicated by the following three tables that highlight the balance of standard representation by grade level for English language arts, mathematics, and science on the ORExt. The representation ratios can be calculated by dividing the standards by the total within each respective column. For example, in Grade 3 Reading, approximately 25% of the items are in the Reading Standards for Literature domain, as that domain has 4 written Essentialized Standards (EsSt) out of the total of 16 (4/16 = 25%).
The test blue prints below directly correspond to the number of ES written in each domain within the Essentialized Assessment Frameworks (EAF) spreadsheets. There are additional grade level standards addressed by the EsSt, as some EsSt link to multiple grade level content standards. However, the blueprints below reflect only the written EsSt and are thus an underrepresentation of the breadth of grade level content addressed by the ORExt.
The primary purpose of the ORExt assessment is to yield technically adequate performance data on grade level state content standards for students with significant cognitive disabilities in English language arts, mathematics, and science at the test level. All scoring and reporting structures mirror this design and have been shown to be reliable measures at the test level (see Section 4.1). The process of addressing any gaps or weaknesses in the system is accomplished via field-testing (see Section 3.1A).
Distributions of point measure correlations and outfit mean square statistics for operational items are provided below, by content area and grade. Point measure correlations display how the item scores correlate with the latent overall score, while outfit mean square statistics closer to 1.0 denote minimal distortion of the measurement system. All items included in the 2016-17 operational assessment are represented. Point measure correlations in ELA ranged from 0.42 to 0.74. All data visualizations were conducted with ggplot in the tidyverse package (Wickham, H., 2017).
Point measure correlations for ELA (Grade 3-11) here:
Point measure correlations for Math (Grade 3-11) here:
Point measure correlations for Secience (Grades 5, 8, & 11) here:
Outfit mean square values below 1.0 demonstrate that values are too predictable and perhaps redundant, while values above 1.0 indicate unpredictability. Items above 2.0 are deemed insufficient for measurement purposes and flagged for replacement. While most OMS values in ELA were between 0.5 and 1.5, one item in Grade 6 was above 2.0 and was removed.
Outfit Mean Square for ELA here:
Outfit Mean Square for Math here:
Outfit Mean Square for Science here:
Annual Measurable Objective (AMO) calculations were conducted based upon student performance on the ORExt tied to the vertical scale using Rasch modeling. Overall results are largely consistent with 2015-16, with approximately 50% of students with significant cognitive disabilities achieving proficiency across grades and content areas. ELA results are presented in blue, mathematics in dark green, and science in red. The data visualizations presented below were conducted with ggplot in the tidyverse package (Wickham, H., 2017).
AMO for ELA Grade 3 here:
AMO for ELA Grade 4 here:
AMO for ELA Grade 5 here:
AMO for ELA Grade 6 here:
AMO for ELA Grade 7 here:
AMO for ELA Grade 8 here:
AMO for ELA Grade 11 here:
AMO for Math Grade 3 here:
AMO for Math Grade 4 here:
AMO for Math Grade 5 here:
AMO for Math Grade 6 here:
AMO for Math Grade 7 here:
AMO for Math Grade 8 here:
AMO for Math Grade 11 here:
AMO for Science Grade 5 here:
AMO for Science Grade 8 here:
AMO for Science Grade 11 here:
Some concerns are noted in mathematics, where relatively higher percentages of students are scoring at Level 1 and very few at Level 2. However, this finding is consistent with the range of possible scores, where Level 2 in some cases only has two possible scale score points (e.g., Grade 7, where Level 2 exists between 207-208 scaled scores). The addition of 1-2 low complexity items per assessment will be effected in mathematics to address this concern, as well.
Perhaps the best model for understanding criterion-related evidence comes from Campbell and Fiske (1959) in their description of the multi-trait, multi-method analysis [we translate the term ‘trait’ to mean ‘skill’]. In this process (several) different traits are measured using (several) different methods to provide a correlation matrix that should reflect specific patterns supportive of the claim being made (that is, provide positive validation evidence). Sometimes, these various measures are of the same or similar skills, abilities, or traits, and other times they are of different skills, abilities, or traits. We present data that quite consistently reflect higher relations among items within an academic subject than between academic subjects. We also present data in which performance on items is totaled within categories of disability, expecting relations that would reflect appropriate differences (see Tindal, McDonald, Tedesco, Glasgow, Almond, Crawford, & Hollenbeck, 2003).
Criterion validity information is difficult to document with AA-AAAS, as most SWSCD do not participate in any standardized assessment outside of the ORExt and/or ORora in Oregon. Divergent validity evidence is garnered via comparisons of ORExt results to ORora outcomes shows that students whose ORExt assessments are discontinued exhibit serious limitations in attention, basic math skills, and receptive and expressive communication skills. The median ORExt ELA score for SWSCD who participated in the ORora was 4.0. The median mathematics ORExt score was 4.0, and the median science ORExt score for SWSCD who were evaluated with the ORora was 0.0. Pearson correlations between the total raw scores on the ORExt and the total raw score on the ORora were conducted to address the relationship between total performance on each assessment. The correlation between ELA and ORora scores was 0.56, between Math and ORora scores was 0.52, and between Science and ORora scores was 0.33. As expected, the ORora results provide divergent validity evidence for the ORExt. We would not expect a strong relationship between the scores, as students whose ORExt testing is discontinued are generally unable to access the academic content on the ORExt, even with the requisite reductions in depth, breadth, and complexity.
Convergent evidence that the ORExt is assessing appropriate academic content is provided by QA and QT responses to the consequential validity survey. Respondents to the survey generally agree that, “The items in the Oregon Extended Assessment accurately reflect the academic content (what the student should know) that my students with significant cognitive disabilities should be learning, as defined by grade level content standards (CCSS/NGSS) and the Essentialized Assessment Frameworks” (11% Strongly Agree & 54% Agree). In addition, they also agreed with the statement that, “The items in the Oregon Extended Assessment, which primarily ask students to match, identify, or recognize academic content, are appropriate behaviors to review to determine what my students with significant cognitive disabilities are able to do” (18% Strongly Agree & 64% Agree). The consequential validity results demonstrate that the ORExt is sampling academic domains that the field of QAs and QTs deem appropriate in the area of academics.
We conducted correlational analyses to further explore the validity of the ORExt. We first describe the purpose of the analysis, as well as our anticipated results. We then discuss our observed results before concluding with an overall evaluative judgment of the validity of the test.
In the correlational analysis, we explore the correlations among students’ total scores across subject areas. The purpose of the analysis was to investigate how strongly students’ scores in one area were related to students’ scores in other subject areas. If the correlations were exceedingly high (e.g., above .90), it would indicate that the score a student receives in an individual subject has less to do with the intended construct (i.e., reading) than with factors idiosyncratic to the student. For example, if all subject areas correlated at .95, then it would provide strong evidence that the tests would be measuring a global student-specific construct (i.e., intelligence), and not the individual subject constructs. We would expect, however, that the tests would correlate quite strongly given that the same students were assessed multiple times. Therefore, we would expect moderately strong correlations (e.g., 0.7) simply because of the within-subject design. Idiosyncratic variance associated with the individual student is thus captured.
Full results of the Pearson’s product-momentcorrelation analysis by content area and grade level are reported below. The results are significant, yet the overall correlations across content areas suggest that we are indeed measuring different, though strongly related constructs, with between-test scaled score correlations ranging from 0.81 to 0.89.
Grade 3 Content Area Correlations table here:
Grade 4 Content Area Correlations table here:
Grade 5 Content Area Correlations table here:
Grade 6 Content Area Correlations table here:
Grade 7 Content Area Correlations table here:
Grade 8 Content Area Correlations table here:
Grade 11 Content Area Correlations table here:
Results of the Pearson’s product-moment correlation analysis within English language arts (ELA:Reading:Writing) are reported below and suggest high correlations between ELA and Reading, as expected, from .95 to .97. Writing is correlated with ELA from .91 to .94 and with reading from .79 to .88.
English Language Arts Subscore Correlations table here:
The ORExt assessments appear to be measuring separate constructs, as intended, indicated by the correlations. No unexpected and consistent test functioning statistics are present based on student characteristics that should not be related, such as gender and ethnicity. Student performance appears to be primarily related to item difficulty and not the result of construct irrelevant aspects that have been reviewed.
Test reliability can be viewed through several lenses, all of which document how consistently an assessment performs across occasions, contexts, and raters . Typical strategies for addressing reliability include documentation of internal consistency, split-half reliability, and test-retest reliability. If multiple forms are implemented, test form reliability documentation is also requisite. The implementation plan for the ORExt includes initial documentation of internal consistency (Cronbach’s alpha). The 2015-16 technical report will include internal consistency estimates, split-half reliability analyses, as well as a small test-retest assessment of reliability comparisons by means of our pilot tablet administration study. There is only one test form for the ORExt, so test form comparisons are not possible.
Marginal reliability results (true score variance/true score variance + error variance) demonstrate that the tests are quite reliable at the total test level. Full reliability statistics for each of the operational tests administered this year are provided below. These results demonstrate that the total test reliabilities were quite high, ranging from .87 to .92. Each table below provides the content area, grade, and the marginal reliabilities. All test forms were composed of 36 operational and 12 embedded field-test items.
The test reliabilities for ELA were in the high range, from .? to .?. | Grade | Marginal Reliability | | :——— | :——————- | | __________ | ____________________ | | 3 | 0.92 | | 4 | 0.92 | | 5 | 0.91 | | 6 | 0.91 | | 7 | 0.90 | | 8 | 0.91 | | 11 | 0.87 |
The test reliabilities for mathematics were in the high range, from .? to .?. | Grade | Marginal Reliability | | :——— | :——————- | | __________ | ____________________ | | 3 | 0.91 | | 4 | 0.91 | | 5 | 0.90 | | 6 | 0.90 | | 7 | 0.91 | | 8 | 0.88 | | 11 | 0.90 |
The test reliabilities for science were in the high range, from .? to .?. | Grade | Marginal Reliability | | :——— | :——————- | | __________ | ____________________ | | 5 | 0.91 | | 8 | 0.88 | | 11 | 0.87 |
The test information functions published below also indicate that the scales exhibit a reliability greater than or equal to .80 for all proficient-level cutscores.
Grade 3 ELA TIF here:
Grade 4 ELA TIF here:
Grade 5 ELA TIF here:
Grade 6 ELA TIF here:
Grade 7 ELA TIF here:
Grade 8 ELA TIF here:
Grade 11 ELA TIF here:
Grade 3 Math TIF here:
Grade 4 Math TIF here:
Grade 5 Math TIF here:
Grade 6 Math TIF here:
Grade 7 Math TIF here:
Grade 8 Math TIF here:
Grade 11 Math TIF here:
Grade 5 Science TIF here:
Grade 8 Science TIF here:
Grade 11 Science TIF here:
The test characteristic curves (TCCs) for the grade-level assessments in ELA and mathematics demonstrate incrementally increasing growth and test demands across Grades 3-8, with the exception of Grade 7 mathematics. The Grade 7 mathematics assessment was revised to be more difficult last year, but clearly more elaboration of this effort is needed to address its location on the TCC. Grade 11 and science tests are not vertically scaled; TCCs are thus not presented for Grade 11 or science. All Rasch model scaling, as well as the data visualizations for the TCCs were conducted in the R software 3.3.2 environment (R Core Team, 2016) using the r2Winsteps package (Anderson, D., 2017).
Test Characteristic Curve for ELA here:
Test Characteristic Curve for Math here:
The average SEM associated with each cut score for 2016-17 student data are presented in the table below, supported by a KEY. The SEMs decreased in almost all cases compared to last year, suggesting that the measures are more reliable when student eligibility is more strictly controlled. See Section 4.2 below for means and standard deviations by grade and subject area. SEM = Standard Error of Measure associated with the cut score to the left; averaged to the tenths’ place. Level 1 = Does Not Yet Meet (not included as the lowest level of proficiency) Level 2 = Nearly Meets Level 3 = Meets Level 4 = Exceeds
English Language Arts table here:
Mathematics table here:
Science table here:
Results from the 2016-17 ORExt test administration were analyzed using Rudner’s classification index (Rudner, 2005). Results closer to 1.0 indicate the likelihood that a student was appropriately classified as proficient or not proficient (accuracy) and the likelihood that the student would be classified in the same category given an additional test administration. The calculation utilizes item difficulty and theta value distributions, as well as related standard errors of measurement, to generate probabilistic estimates based on one test administration. Complete results, generated from the cacIRT package in R, are provided below. Results denote very high levels of classification accuracy and consistency.
ELA Test Classification Accuracy table here:
ELA Test Classification Consistency table here:
The ORExt is not a computer-adaptive instrument so estimate precision documentation based upon that test design is not provided.
The state has taken steps to ensure fairness in the development of the assessments, including an analysis of each test item by Oregon teachers not only for linkage to standards, but also for access, sensitivity, and bias (see Appendix 3.1A). In addition, we reviewed test functioning as relevant to race/ethnicity and disability subgroups. This process increases the likelihood that students are receiving instruction in areas reflected in the assessment, and also that the items are not biased toward a particular demographic or sub-group.
To investigate Differential Item Functioning (DIF), the Mantel-Haenszel test using a purification process was conducted (Holland & Thayer, 1988; Kamata & Vaughn, 2004) with the R software using the difR package (Magis et al., 2013). When using the Mantel-Haenszel test to investigate DIF, contingency tables are constructed, and the resulting odds for the focal group answering the item correctly are compared to the odds for the reference group. Given n-size limitations (Scott, et al., 2009), we were able to conduct two analyses: a) White/Non-White and b) Male/Female. Whites and Males were the focal groups and Non-Whites and Females were the reference groups, respectively. The contingency table summarizes correct and incorrect responses to each item by respondents’ total raw score by subgroup (Kamata & Vaughn, 2004). If there is no difference in performance for the two groups, the odds ratio of the focal group performance to reference group performance will equal one. An odds ratio greater than one means the focal group is performing better than the reference group, with the opposite being true for odds ratios less than one.
The difR package contains a built in algorithm to conduct purification automatically, so we were interested in how this algorithm functioned relative to the iterations conducted manually using SPSS. We used criteria outlined by the Educational Testing Service (ETS) for DIF Classification (Holland & Thayer, 1988) to determine whether or not items exhibited DIF, as the difR package reports delta values by default, defined as \[\Delta_{MH} = -2.35*ln(\alpha_{MH})\]
The Holland and Thayer criteria were used for all Mantel-Haenszel analyses. Items that were flagged as “C” level items were reviewed by BRT researchers for potential biases. If biases are identified, the item is removed from the item pool. DIF analyses were performed ex post facto on the 2015-16 ORExt operational items to address longitudinal trends. Only three ELA items were identified as exhibiting a “C” level DIF across both 2016 and 2017. Those three ELA items, one in Grade 5 that exhibited DIF that privileged White examinees, one in Grade 4 that privileged Female examinees, and one in Grade 8 that privileged Female examinees, were removed and will not be used in 2017-18 or thereafter. DIF analyses will also be performed in the 2017-18 school year to continue to address DIF longitudinally. All items, including field test items, were included in the analyses. There are a total of 48 items on each assessment.
Within the White/Non-White analysis, 10 out of 18 items flagged as “C” level items privileged Non-White test participants in ELA, 2 out of 5 privileged Non-White test participants in Mathematics, and 2 out of 7 privileged Non-White test participants in Science. Overall, DIF flagging bases on race was relatively balanced, with 14 privileging students who were Non-White and 16 privileging students who were White.
White/Non-White DIF Analyses Results table here:
In terms of the Male/Female analyses, 10 out of 16 items flagged as “C” level items privileged Females in ELA, 4 out of 9 flagged items privileged Females in Mathematics, and 8 out of 11 flagged items privileged Females in Science. Overall, DIF flagging based on sex was relatively balanced, with 22 privileging Females and 14 privileging Males.
Male/Female DIF Analyses Results table here:
The full ethnic and disability demographics for students taking the ORExt are reported below. Students ethnicity/race was reported in seven categories: (a) American Indian/Alaskan Native, (b) Asian, (c) Black or African-American, (d) Multi-ethnic, (e) Native Hawaiian or Other Pacific Islander, (f) Hispanic, or (g) White. The majority of students were reported as White (55-62%) or Hispanic (22-29%). These results are largely consistent with the demographics reported for the general assessments, though percentages taking the ORExt are slightly higher for most students of color and generally lower for students who are Asian or White (see Appendix 4.2).
English Language Arts (Ethnicity Race) table here:
Mathematics (Ethnicity Race) table here:
Science (Ethnicity Race) table here:
Student reported exceptionalities included Intellectual Disability (ID), Hearing Impairment (HI), Visual Impairment (VI), Deaf-Blindness (DB), Communication Disorder (CD), Emotional Disturbance (ED), Orthopedic Impairment (OI), Traumatic Brain Injury (TBI), Other Health Impairment (OHI), Autism Spectrum Disorder (ASD), and Specific Learning Disability (SLD). The majority of students who participated in the ORExt were students with ID (30-45%) and students with ASD (28 -34%), followed by students with OHI (11 -16%). ODE policy for 2015-16 changed to require students who participate in the ORExt to take the assessment in all relevant content areas. There is thus very little change in terms of participation percentages across content areas, as evidenced by the total n-sizes per grade level displayed below.
English language arts Exceptionality table here:
Mathematics Exceptionality table here:
Science Exceptionality table here:
The following tables provide information regarding observed means and standard deviations by content area and grade level. The Grade 3-8 English language arts and mathematics scaled scores are centered on 200, while all Grade 11 scores are centered on 900 (to reinforce that they are not on the vertical scale). Science is centered on 500 at Grade 5 and centered on 800 at Grade 8. The vertically scaled scores generally convey incremental gains in achievement across grade levels, though the results suggest small losses appearing at Grade 8 in ELA. These scales were selected to clearly determine whether scores are on the same scale and also to differentiate among the statewide assessments in use to avoid confusion (i.e., SBA, OAKS, ORExt, ELPA, KA). The general pattern is that RIT scores decreased from 2014-15 to 2015-16. This decrease is attributed not to the scale, nor to deceleration of growth, but to the substantive shift in the tested student population as a result of ODE eligibility guidelines. The scale from 2015-16 to 2016-17 appears to have stabilized because the student population tested was more consistent.
2014-15 RIT Scores table here:
2015-16 RIT Scores table here:
2016-17 RIT Scores table here:
The following tables provide information regarding average student performance by grade level and sex (Female/Male) in each of the content areas assessed on the ORExt. Significant differences based on a Welch two sample t-test are noted in Grade 4 ELA, Grades 4, 5, and 8 in mathematics and Grades 5 and 8 in science.
English Language Arts table here:
Mathematics table here:
Science table here:
The following tables provide information regarding average student performance by grade level and race (Non-White/White) in each of the content areas assessed on the ORExt. Significant differences are noted by two sample t-tests in ELA Grade 3 and 8 and in Grade 8 in Science.
English Language Arts table here:
Mathematics table here:
Science table here:
The following tables provide information regarding average student performance by grade level and exceptionality category in each of the content areas assessed on the ORExt. Students with SLD were generally the highest performing group, though students with CD and ED performed higher at certain grade levels/content areas. The lowest performing group was consistently students with OI, followed by students with ID or ASD, depending upon grade level.
English Language Arts table here:
Mathematics table here: Science table here:
The graphs below convey information similar to that shared above in graphic form.
The graphics include 95% confidence interval error bars, so determining which subgroups performed in a manner that is significantly better than others is readily apparent by looking at the location of the error bars. Error bars that do not overlap in terms of the y-scale are significantly different. Only students who generally had more than 10 members at each grade level are reported. This required the removal of graphs for students in the HI, VI, DB, and TBI categories.
Students with OI are again the lowest performing group, being significantly outperformed by all other subgroups. Students with SLD are consistently outperforming most peers, with students with ED and CD performing at similarly high levels.
Students with OI are consistently the lowest performing group, which led to concerns regarding test accessibility. However, the results of last year’s consequential validity study demonstrated that the OI label is insufficient to fully describe the severity and range of concomitant disabilities that students whose primary label is OI conveys.
Average (All Grade) ELA RIT Scores By Exceptionality here:
Average (Grade 11) ELA RIT Scores By Exceptionality here:
Average (All Grade) Math RIT Scores By Exceptionality here:
Average (Grade 11) Math RIT Scores By Exceptionality here:
Average (Grade 5) Science RIT Scores By Exceptionality here:
Average (Grade 8) Science RIT Scores By Exceptionality here:
Average (Grade 11) Science RIT Scores By Exceptionality here:
The ORExt was redesigned in 2014-15 to support growth determinations in Grades 3-8 in English language arts and mathematics. A vertical scale using a balanced design was used to develop the initial scale. Now that we are in the third year of administration, it became possible to model growth expectations for ELA and Math for SWSCD who took the ORExt. The following graphs convey the average growth expectations for SWSCD in Oregon and should provide some context for understanding typical performance and average growth in Individualized Education Program (IEP) meetings.
The ODE changed the eligibility criteria for SWSCD to participate in the ORExt in the 2015-16 school year. This had an impact on the tested population, as the expectations were more prescriptive, and student populations decreased by an average of 40% in each content area and grade level tested. This change also affected ORExt test results, as the students who participated in the first administration but not in subsequent administrations were generally very high achieving. To generate growth estimates that matched the intended student population for the ORExt, namely students who did not exit the assessment after the 2015 administration, all datasets for growth modeling excluded the group of students who participated in only the 2015 administration. Students whose grade level advancement was not typical were also excluded (n = 18 exclusions in ELA and math, respectively). All other participants were maintained.
The observed cohort means are represented below for comparison purposes. In ELA, the scores at Grade 3 average a RIT score of 205.72. By Grade 8, the average RIT score in ELA is 218.99. In terms of observed means, students thus grow a total of 13.27 RIT score points from Grades 3 to 8 in ELA, for an average annual growth rate of 2.21 RIT score points per year. In mathematics, the average Grade 3 RIT score was 193.20. By Grade 8, the average score was 205.78. Students’ observed means thus increased by 12.58 RIT score points, for an average annual growth rate of 2.10 RIT score points per year.
English Language Arts Observed Means 2015 - 2017 by Cohort table here:
Mathematics Observed Means 2015-2017 by Cohort table here:
Observed means hide a substantial amount of information, however, as they do not account for the variance in scores that exists in the population. We thus conducted unconditional growth models to parse out the variance associated with each intercept and slope estimate. We included multiple cohorts to address the observed non-linearity in the growth estimates. All data preparation and analyses were conducted in the R software 3.3.2 environment (R Core Team, 2016) using the lme4 package (Bates, Maechler, Bolker, & Walker, 2015). In addition, the data visualizations below were conducted with ggplot in the tidyverse package (Wickham, H., 2017). Cohort effects were addressed by averaging across overlapping grades; however, the process of averaging over cohorts should continue annually.
Unconditional Model-Predicted ELA Means 2015 - 2017 by Cohort table here:
English Language Arts Growth (RIT Score by Grade) Cohort here:
Unconditional Model-Predicted Mathematics Means 2015 - 2017 by Cohort table here:
Mathematics Growth (RIT Score by Grade) Cohort here:
The unconditional growth estimates show that there were interesting cohort effects, with Cohort 3 a very high achieving cohort in both ELA and mathematics. These cohort effects are worthy of further study and imply that caution should be used when interpreting growth estimates for the ORExt for specific applications. When averaging across cohorts, students in ELA achieved a RIT score of 206.21 points in Grade 3 and grew to a RIT score of 219.38 by Grade 8. The average growth was 2.10 RIT score points per year. When averaging across cohorts, students in Math achieved a RIT score of 193.72 points in Grade 3 and grew to a RIT score of 205.51 by the Grade 8. The average growth was 2.19 RIT score points per year. Curvilinearity is noted in the ELA data, however, with more growth occurring at the earlier grades than at the later grades. Mathematics growth appears to be more linear.
ORora Change Scores from 2016 to 2017 The ORora total raw scores from 2016 and 2017 were compared to determine how much change was exhibited from the first administration of the ORora in 2016 to the second administration in 2017. A total of 849 students participated in the ORora in 2016 and a total of 772 participated in 2017. Only 473 of those students participated in the ORora for both years of the administration. The n-size for the plots below includes those 473 students. The range of possible scores on the ORora is from 20 to 80. The mean score in 2016 was 46.12, while in 2017 the mean was 48.08. The average change from 2016 to 2017 on the ORora was 1.827 points, but there was great variation in change scores (min = -60, max = +40).
2016 ORora Results here:
2017 ORora Results here:
2016-17 ORora Change Scores here:
The ORExt is designed to sample the Common Core State Standards in English language arts (Reading, Writing, and Language) and Mathematics, as well as the Oregon Science Standards and Next Generation Science Standards in science in a purposive, validated manner. The ORExt test blueprints convey the balance of representation exhibited by the assessment (see Appendix 2.1B). These test blueprints are supported by the ORExt Extended Assessment Frameworks Test blueprints link , which define the assessable content on the ORExt that has been reduced in depth, breadth, and complexity (RDBC) using our defined process (see Appendix 2.3A.3). The decisions regarding which standards to target for essentialization, as well as the strength of linkage between the Essentialized Standards and the CCSS/ORSci/NGSS has been validated by Oregon teachers, as well (see Appendix 3.1A).
Though a simplified and standardized approach was taken to design items, and efficiency and access to the assessment increased for the majority of students (as evidenced by the decreased percentages of zero scores across all content areas), a small subgroup of students remains who cannot access an academic assessment. This is true even though items have been significantly RDBC at three levels of complexity (low-medium-high difficulty). As a response, ODE commissioned BRT to design and implement an observational rating scale for this group of very low-performing students, called the Oregon Observational Rating Assessment (ORora) for the spring 2016 administration. The ORora targets communication (expressive and receptive) and basic skills (attention/joint attention and mathematics) and provides documentation of student progress outside of our clearly defined academic domains.
Items on all assessments were scored on a 2-point scale, with 1 point awarded for a correct response and 0 points awarded for an incorrect response. Plots are provided below for each content area and grade level, including the person ability and item difficulty distributions. In general, the descriptive statistics suggest that the test had an appropriate range of item difficulties represented, from easy to difficult, with item difficulties generally ranging from -4.0 to +4.0 on the Rasch scale. The assessments performed as expected across all grades and content areas with the exception of Grade 7 mathematics, as noted above. The item person distributions provided below demonstrate that the ORExt is providing a performance continuum for students who participate.
Grade 3 ELA here: Grade 4 ELA here: Grade 5 ELA here: Grade 6 ELA here: Grade 7 ELA here: Grade 8 ELA here: Grade 11 ELA here:
Grade 3 Math here: Grade 4 Math here: Grade 5 Math here: Grade 6 Math here:
Grade 7 Math here: Grade 8 Math here: Grade 11 Math here:
Grade 5 Science here: Grade 8 Science here: Grade 11 Science here:
Engllish Language Arts here:
Mathematics here:
Science here:
All scoring expectations for the ORExt are established within the Administration Manual (see Appendix 2.3, p. 14). The scoring procedures for the new ORExt have been simplified, with students receiving a 0 for an incorrect response or a 1 for a correct response. Input from the field gathered from Consequential Validity studies demonstrates that the assessment scoring procedures are much more clear and easier to implement than prior scoring approaches (see Appendix 2.3B.10). BRT was also commissioned to develop a scaled score interpretation guide, which describes specific strategies for interpreting student test scores and sub-test scores in Reading and Writing, and Achievement Level Descriptors (ALDs) published within the Individual Student Reports (see Appendix 6.4C) for annual performance, growth, and as part of Essential Skills requirements for very low performing students (see Appendix 2.1A).
The ORExt was administered in only one grade level form for the 2016-17 school year, with 36 operational items arranged in order of empirical difficulty and 12 embedded field test items.
The ORExt is provided in the standard format, but is also available in Large Print and Brailled formats. Test content is identical across all three versions, with an occasional item being eliminated on the Braille version due to inaccessibility. These items do not count for or against the student in reporting. Substantive test comparability analyses are not feasible, given the small n-sizes of the samples involved in the alternative versions.
The ORExt technical analyses that document reliability and validity are included in this technical report (see Sections 3 and 4, respectively). ODE and BRT staff reviews these analyses annually. Necessary adjustments to the assessment are determined prior to implementation of the subsequent year’s work plan, which elaborates the areas of improvement as well as aspects of the testing program that will be maintained. This decision-making is supported by input from the field gathered from the Consequential Validity study (see Appendix 2.3B.10).
One noteworthy example of the impact of our system of ongoing improvement this year is the development of additional curricular and instructional resources, which addresses an area of concern expressed by stakeholders. The training modules we developed to connect the assessment results garnered from the ORExt and ORora with curricular resources and instructional strategies that are aligned.
The Oregon assessment system provides explicit guidance regarding the participation of all public school students in its statewide assessment program (see Section 1.4).
The assessment options for all public school students in Oregon are elaborated in the Oregon Test Administration Manual (see Appendix 1.4.2, p. 7). These options include the Smarter Balanced Assessment in English language arts and mathematics in Grades 3-8 & 11, the Oregon Assessment of Knowledge and Skills in science in Grades 5, 8, & 11, and in the same content areas and grade levels for SWSCD who take the ORExt (see Appendix 1.4.2, p. 92-93). Social studies assessment is a district option within the OAKS portal, as well. In addition, expectations for the English Language Proficiency Assessment (ELPA) and the Kindergarten Assessment are provided.
A student’s IEP team determines how a student with disabilities will participate in the Oregon Statewide Assessment program. The IEP team must address the eligibility criteria for participation in the ORExt before determining that the assessment is the appropriate option (see Appendix 5.1B).
As noted earlier, IEP teams make decisions regarding how students with disabilities participate in the Oregon statewide assessment program. At present, students participate in one of three options: (a) student takes the general assessment with or without universal tools. (b) student takes the general assessment with designated supports and/or accommodations, or (c) student takes the ORExt. Guidelines for making universal support, designated support, and accommodations decisions for the general assessments are provided in Appendix 2.3A.1. Guidelines for making these determinations for SWSCD who participate in AA-AAAS are provided in Appendix 5.1B.
Information regarding accessibility options for the general assessment can be found with the general assessment Peer Review evidence. For the ORExt, accessibility is treated holistically, with universal design for assessment concepts embedded in the item design and a wide variety of accommodations also available if needed. Items are crafted to be visually simple and clean. Graphic supports, which are always black/white line drawings, are embedded in all items at the low level of complexity but are phased out as items become more complex. Items are designed to incorporate simplified language unless specific academic vocabulary and concepts is what is being tested (see Appendix 2.3A.3). The items on the ORExt are all selected response, with three response options allowing for multiple modes of access (e.g., saying the answer, pointing to the answer, eye gaze, switch, etc.). All text presented to students is at least 18-pt font (larger, of course, in the large print version). Sample items are presented in Appendix 2.2.3. All accessibility supports, designated supports, and accommodations for the ORExt are published in Appendix 2.3A.1, p. 36-43. For students who have very limited to no communication and are unable to access even the most accessible items on the ORExt, an Oregon Observational Rating Assessment (ORora) was implemented in 2015-16. The ORora is completed by teachers and documents the student’s level of communication complexity (expressive and receptive), as well as level of independence in the domains of attention/joint attention and mathematics. The administration instructions and 2015-16 results for the ORora are included in Appendix 5.1D.
Guidance regarding appropriate accommodations is published in Appendix 2.3A.1. District and School Test Coordinators provide annual training on test security and administration. The ORExt approaches access as part of test design, as noted above in Section 5.1D. The complexity of SWSCD communication systems demands such an approach. In addition, comprehensive accommodations are allowed in order to decrease the chances that a disability may interfere with our ability to measure the student’s knowledge and skills.
ODE’s eligibility guidelines make it clear that all SWDs are eligible for the ORExt, regardless of disability category, and that specific disability category membership should not be a determining factor for considering participation (see Appendix 5.1B).
The Parent FAQ section of the General Administration Manual makes it clear that parents must be informed of the potential consequences of having their child assessed against alternate achievement standards, including diploma options. Parents are also informed that alternate achievement standards are designed to reflect a significant reduction in depth, breadth, and complexity and are therefore not comparable to general academic achievement standards (see Appendix 2.3, p. 28-32).
The ORExt is strongly linked to the CCSS/ORSci/NGSS, as evidenced by our linkage study results (see Appendix 3.1A). The claim is based on the following warrants: (a) ORExt items are aligned to the Essentialized Standards; (b) the Essentialized Standards are strongly linked to the grade level content standards; therefore (c) the ORExt items are strongly linked to grade level content expectations. It is thus expected that the ORExt promotes access to the general education curriculum by assessing general education content that has been reduced in depth, breadth, and complexity yet maintains the highest possible standard for SWSCD.
In addition, ODE commissioned BRT to work with Oregon teachers of SWSCD in the 2015-16 school year to develop a variety of curricular and instructional resources that are aligned to the Essentialized Standards. These resources include: (a) curricular templates, (b) video tutorials, and (c) supporting documents that provide specific guidance regarding how to develop lesson plans, Present Levels of Academic and Functional Performance (PLAAFP) statements, and Individualized Education Program (IEP) goals and objectives that are aligned with the Essentialized Standards. It is also expected that the essentialization process will generalize to many students who are performing off grade level, not merely to SWSCD. All resources are published on a BRT-sponsored website at BRT link
In addition to the programmatic guidance provided in Appendix 1.4A.1 related to EL program eligibility and services, ODE also provides guidance relevant to the inclusion of ELs in the statewide assessment program in Appendix 1.4.2. Though the ORExt is currently published in English, an appropriately qualified interpreter can provide the assessment to any SWSCD from diverse language backgrounds, including American Sign Language. ODE has developed a training module to increase the standardization of ASL administration for its statewide assessments, available at ASL link.
Additional information regarding the inclusion of ELs in Oregon’s general assessments is provided in the general assessment Peer Review evidence.
All statewide accommodation guidance is published in the Accessibility Manual (see Appendix 2.3A.1), outlining the universal tools and designated supports available to all students, and accommodations, available only to students with disabilities or students served by Section 504 Plans. In addition, the manual defines the supports as embedded, where they are provided by the online test engine (e.g., calculator, text-to-speech), or non-embedded, where they must be provided by a qualified assessor (e.g., read aloud, scribe). The manual also makes it clear that these supports are content-area specific, as a universal tool in one content area may be an accommodation in another.
Appropriate accommodations for the ORExt are published in Appendix 2.3A.1, p. 36-43. Additional accommodations for all statewide assessments are also published in this manual. The Oregon Accommodations Panel reviews the appropriateness of the supports listed annually. Practitioners may also request the addition of an accommodation through a formal process (see Appendix E: Approval Process for New Accessibility Supports within the manual, Appendix 2.3A.1, p. 100-102).
As noted in Sections 5.2A-C, the ORExt is accessible in any communication modality through the use of an interpreter. Appropriate accommodations for the ORExt are published in Appendix 2.3A.1, p. 36-43. Additional accommodations for all statewide assessments are also published in this manual. The Oregon Accommodations Panel reviews the appropriateness of the supports listed annually. Practitioners may also request the addition of an accommodation through a formal process (see Appendix E: Approval Process for New Accessibility Supports within the manual, Appendix 2.3A.1, p. 100-102).
In addition to the evidence gathered during the linkage study (see Appendix 3.1A), which suggests that the ORExt items were accessible and free of bias even before final editing, the appropriateness of the supports listed in Appendix 2.3A.1 is reviewed annually by the Oregon Accommodations Panel. Practitioners may also request the addition of an accommodation through a formal process (see Appendix E: Approval Process for New Accessibility Supports within the manual, Appendix 2.3A.1, p. 100-102). ODE is collecting accommodations codes for the ORExt from Qualified Assessors who opt to enter this information in order to make performance comparisons feasible. It is hoped that this process will be required by spring 2018. The consequential validity study for 2018 will include questions regarding the appropriateness of the available accommodations, as well.
ODE has a formal process stakeholders can use to request accommodations that are not already published in the Accessibility Manual (see Appendix E: Approval Process for New Accessibility Supports within the manual, Appendix 2.3A.1, p. 100-102).
ODE monitoring of test administration in its districts and schools is elaborated within the general assessment Peer Review evidence and is therefore not addressed here.
The Oregon Extended assessment (ORExt), Oregon’s Alternate Assessment based on Alternate Academic Achievement Standards (AA-AAAS), is part of the Oregon Statewide Assessment System. The ORExt is administered to Oregon students with the most significant cognitive disabilities (SWSCD) in English language arts and mathematics in Grades 3-8 and 11. The ORExt is administered in science in Grades 5, 8, & 11. The ORExt links to the CCSS in English language arts and mathematics. The new ORExt is dually linked to Oregon’s former science standards, as well as to the NGSS. Results from the English language arts and math administrations are included in calculations of participation and performance for Annual Measureable Objectives (AMO) - a provision of the No Child Left Behind Act (NCLB). Science participation is also included as part of the Title 1 Assessment System requirements, and is administered in grades 5, 8, & 11.
The revised ORExt is built upon a vertical scale in order to support reliable determinations of annual academic growth in ELA and mathematics in Grades 3-8. The complete vertical scaling plan and operational item selection decision rules are located in Appendix 2.2.1.
The State Board of Education formally adopted the AAAS and achievement level descriptors (ALDs) on June 25, 2015 (see Appendix 6.1A.1). The ELA, Math, and Science AAAS, including both the ALDs and the requisite cut scores are included in Appendix 6.1.A.2.
The state applies the AAAS to all public school-served SWSCD who participate in the ORExt in Grades 3-8 & 11 in English language arts and mathematics, and in Grades 5, 8, & 11 in science.
The alternate academic achievement standards in Oregon are composed of four levels (though only three are required). In descending order, they are (a) Level 1, (b) Level 2, (c) Level 3, and (d) Level 4. Level 1 and Level 2 performances represent proficient achievement, while the bottom two levels represent achievement that is not yet proficient. The procedures followed to develop Oregon’s alternate academic achievement standards were consistent with Title 1 assessment system requirements, including the establishment of cut scores, where relevant. In order to define four levels of proficiency, Oregon set three cut scores across all subject areas: (a) to separate Level 1 from Level 2, (b) to separate Level 2 from Level 3, and, (c) to separate Level 3 from Level 4. The alternate academic achievement standards in English language arts, mathematics, and science for the ORExt, including the achievement level descriptors (ALDs) and cut scores, were established during standard setting meetings held on June 15 (science), 16 (mathematics), and 17 (English language arts).
Standard Setting meetings were held at the University of Oregon in Eugene, OR on June 15, 2015 (Science), June 16, 2015 (Mathematics), and June 17, 2015 (English language arts). A total of 53 standard setters were involved in the process: 11 in Science, and 21 in both English language arts and Mathematics. Panelists were assembled in grade level teams of three, where two members were special educators and one member was a content specialist.
The panelists were highly educated. Over 90% of the panel possessed a Master’s degree or higher. Fifty-seven (57%) percent of the panelists had over 11 years of teaching experience. Seventy-six percent (76%) of the panelists had some experience working with students with significant cognitive disabilities with 64% licensed as Special Educators. The majority of panel members were female (87%), from the Northwest of the state (87%), and White (83%). No panel member self-identified with Oregon’s major minority population (Hispanic).
In addition to the live training during standard setting meetings, panelists were asked to complete several training requirements prior to the standard setting meetings, which oriented them to the student population of students with significant cognitive disabilities (SWSCDs), the Oregon Extended Assessment test design and history, as well as the bookmarking standard setting method. Panelists were quite confident in their preparation and final judgments, as evidenced by responses to the questions: (a) " The training helped me understand the bookmark method and how to perform my role as a standard setter." (b) “I am confident about the defensibility and appropriateness of the final recommended cut scores.” and, (c) “Overall, I am confident that the standard setting procedures allowed me to use my experience and expertise to recommend cut scores for the ORExt.” The hearty majority of standard setters strongly agreed with these statements, while all participants agreed.
The nine-step process implemented for these standard setting meetings was based on Hambleton & Pitoniak (2006) as reported by R.L. Brennan (Educational Measurement, 4th Edition, pp. 433-470). Standard setting evaluation questions posed to participants were adapted from Cizek’s Setting Performance Standards (2012). Standard setters set cut scores and recommended Achievement Level Descriptors (ALDs) for the Oregon State Board of Education to consider. The cut scores were articulated to reflect vertical development, or at least maintenance, of expectations across grades in a manner that respected standard setter judgments to the greatest possible degree. Six changes were made in ELA and Mathematics. Science is not built upon a vertical scale, so no cut score adjustments were necessary in Science. The cut scores are listed below.
English language arts (ELA) table here:
Mathematics table here:
Science table here:
Note: The ELA and Math vertical scales for the ORExt are centered on 200 in grades 3-8 and can be used to document year-to-year growth. None of the other scales should be used for longitudinal comparisons. All Grade 11 scales are independent and centered on 900. The grade 5 Science scale is independent and centered on 500, while the Grade 8 Science scale is independent and centered on 800. An independent auditor evaluated the bookmarking standard setting process. The auditor’s comprehensive report can be found in Appendix 6.2.2.
Oregon educators initially evaluated new Oregon Essentialized Assessment Frameworks in two respects. First, educators were asked to determine the appropriateness of the standards selected for inclusion and exclusion in the Essentialized Standards (yes/no). Second, the level of linkage between the Essentialized Standards and grade level content standard was evaluated (0 = no link, 1 = sufficient link, 2 = strong link). Summary results are provided in the tables below. A comprehensive essentialized standard to grade level standard linkage study, as well as essentialized standard to item alignment study, is provided in Appendix 3.1A.
English language arts table here:
Mathematics table here:
Science table here:
Oregon’s reporting system facilitates appropriate, credible, and defensible interpretation and use of its assessment data. With regard to the ORExt, the purpose is to provide the state technically adequate student performance data to ascertain proficiency on grade level state content standards for students with significant cognitive disabilities (see Sections 3 and 4). In addition, the state makes it clear that results from the Oregon Extended are not comparable to results from the SBA/OAKS (see Appendix 2.3, p. 29-31). Nevertheless, the test meets rigorous reliability expectations (see Section 4.1). Validity is considered here as an overarching summation of the Oregon Extended assessment system, as well as the mechanisms that Oregon uses to continuously improve the ORExt assessment (see Appendix 2.3B.10).
Oregon reports participation and assessment results for all students and for each of the required subgroups in its reports at the school, district, and state levels. The state does not report subgroup results when these results would reveal personally identifiable information about an individual student. The calculation rule followed is that the number of students in the subgroup must meet the minimum cell size requirement for each AMO decision: participation, achievement in English language arts and math, attendance, and graduation, where appropriate (see Appendix 2.6C)
Oregon develops and disseminates individual student data upon final determination of accuracy. The state provides districts with individual student reports (ISRs) that meet most relevant requirements. The state incorporated the Standard Error of Measure (SEM) for each student score into the report templates. The SEM associated with each cut score is provided in Section 4.1B. Also, see the mock-up ISR in Appendix 6.4C.
Oregon’s student reports provide valid and reliable information regarding achievement on the assessments relative to the AAS. The reliability of the data is addressed in Section 4.1. Validity is considered here as an overarching summation of the Oregon Extended assessment system, as well as the mechanisms that Oregon uses to continuously improve the Oregon Extended assessment. The ISRs clearly demonstrate the students’ scale score relative the AAAS that is relevant for that content area and grade level (see Section 4.4 and Appendix 6.4C). The Oregon ISRs provide information for parents, teachers, and administrators to help them understand and address a student’s academic needs. These reports are displayed in a simple format that is easy for stakeholders to understand. District representatives can translate results for parents as necessary. Scaled score interpretation guidance is published in Appendix 2.1A.
In sum, the rigor of the procedural development and statistical outcomes of the ORExt were substantive and support the assessments intended purpose. Procedural evidence includes essentialized standards development, item development, item content and bias reviews, an independent alignment study and item selection based upon item characteristics. Outcome-related evidence included measure reliability analyses, point measure biserials, outfit mean squares, item difficulty and person ability distributions, and convergent and divergent validity evidence. These sources of evidence were all quite good and provide important validity evidence.
The test development process adhered to procedural guidelines defined by the AERA/APA/NCME Standards for Educational and Psychological Testing (2014), as well as incorporating procedures that are known in the field to be best practice. For example, an independent auditor evaluated alignment. In addition, the ORExt reflects what highly qualified Oregon educators believe represents the highest professional standards for the population of students with significant cognitive disabilities, as evidenced in our consequential validity study by teacher support of the academic content on the ORExt as well as the behaviors sampled during test administration.
Dr. Dianna Carrizales conducted an independent alignment study consisting of five evaluation components: a) standard selection for essentialization, b) strength of linkage between essentialized standards and grade level content standards, c) alignment between items and essentialized standards, d) alignment between the essentialized standards and the achievement level descriptors, and e) alignment between the achievement level descriptors and the ORExt test items. Dr. Carrizales reported that, “In the three evaluations that involved determining the relationship between standards and items, reviewers identified sufficient to strong relationships among assessment components in all grades and all subject areas. In the two evaluations involving Achievement Level Descriptors, reviewers identified thirty instances of sufficient to strong relationships out of thirty-four possible relationship opportunities resulting in an overall affirmed relationship with areas for refinements identified.” Overall, documentation collected in the report suggests that the ORExt assessment system is aligned.
The test reliabilities for the ORExt were quite high, suggesting that the assessment items functioned consistently with the test as a whole. The correlations between students’ content scores across subjects were not overly strong, implying that each test measures a distinct construct. The classification consistency analyses demonstrate that the ORExt is appropriately categorizing students into the proficient category, and capable of doing so in a consistent manner. The vertical scale developed in 2014-15 appears to be modeling incremental growth across Grades 3-8 in ELA and mathematics, as intended. The Grade 7 mathematics test continued to demonstrate insufficient item difficulties across the range of low, medium, and high item complexity, however, and must again be amended in the 2017-18 school year. The ELA and science assessments could continue to benefit from the addition of more difficult items, as evidenced by comparisons of the average person abilities and item difficulties. Mathematics assessments appear to be functioning quite well in terms of person abilities and item difficulties, though some additional low level items might help increase access for the group of students functioning at that level.
The Oregon Observational Rating Assessment (ORora) results demonstrate that approximately 17-25% of the SWSCD who participated in the ORExt also took the ORora, depending upon grade level. A total of 755 students were administered the ORora in the 2016-17 test administration. The participants were primarily students with multiple, severe disabilities with very limited communication systems. Analyses of missing data patterns for the ORExt demonstrated that QAs were generally able to adhere to the discontinuation rules. Quantitative results indicate that a total of 755 students across all tested grades were administered the ORora. Response patterns on the ORExt were compared to ORora results to determine what percentages of QAs were administering the ORora due to the minimum participation rule and what percentage were administering the ORora of their own volition. Analyses showed that 234 students were eligible to take the ORora in English language arts, 241 students were eligible to take the ORora in mathematics, and 86 were eligible to take the ORora in science. This means that about 30 students per grade, per content area received five or fewer correct responses within the first 15 items administered on the ORExt. Of the 561 test records that met ORora eligibility requirements, 91 were not administered the ORora. In addition, there were 82 students in ELA and Math, respectively, who were administered the ORora without having participated in the ORExt (74 of those students were the same students, across each content area, with eight students unique to each content area, respectively).
The 2016-17 Oregon Consequential Validity study provides important information for future administrations of the ORExt. The results demonstrate that the test continues to be easy to administer and score and is providing an accessible and appropriate representation of the knowledge and skills that should be required of SWSCD in Oregon. Areas of requested improvement include the provision of a tablet-based administration, which is already planned for 2017-18, and the development of additional life skills items, which cannot be accomplished while maintaining rigorous academic expectations that are linked to Oregon content standards.
The 2016-17 Oregon Extended Assessment Pilot Tablet Administration demonstrated that Oregon teachers highly value provision of a tablet-based administration of the ORExt at the statewide level. Benefits of a tablet-based administration included: increased student engagement, improved standardization, ease of use by teachers, and resource protection (i.e., time, printing, expense). The results also suggest that more robust systems are needed to support user access to the testing application via an automatic username and password process. Focus Group members also recommended that practice items be developed in a tablet format so qualified assessors and students can practice with the tablet administration in preparation for the ORExt test window.
Documenting evidence of validity remains an ongoing and continuous process. Our efforts to continue to improve the assessment system are outlined below, as well as in Sections 3 and 4 above. We also have studies planned over the course of the next three years that will help to solidify the evidence that is accumulating. All of the evidence we have at hand suggests that the ORExt is sufficient to its stated purpose of providing reliable determinations of student proficiency at the test level in order to support systems level analysis of district and state programs. The ORExt will hopefully continue to improve over time due to field-testing and constant monitoring and review, and additional validity evidence will be gathered.
As mentioned above in Section 3.1A, data are presented to support the claim that Oregon’s AA-AAAS provides the state technically adequate student performance data to ascertain proficiency on grade level state content standards for students with significant cognitive disabilities - which is its defined purpose. In this technical report, we have provided content validity evidence related to the ORExt test development process (i.e., essentialization process, linkage study, distributed item review, test blueprint, item writer training and demographics, and item reviewer training and demographics), ORExt test reliability evidence, and ORExt consequential validity evidence. Further analyses over the coming years are planned to continue the development of technical documentation for overall construct validity of the ORExt. The technical documentation plan for the 2016 through 2019 school years is provided below:
Technical documentation plan for the 2016 through 2019 table here:
Appendix Table here: Appendix Descriptions
Appendix 1.1 explains the development process and intended uses for the Essentialized Assessment Frameworks (EAFs). The EAFs are the essentialized standards (EsSt), which are linked to grade level content standards. The ORExt is aligned to the EAFs, as well. While the EAFs primarily guide item development, they are also intended to be used in the development of appropriate Present Levels of Functional and Academic Performance (PLAAFP) statements and Individualized Education Program (IEP) goals and objectives.
Appendix 1.2 conveys the evaluation conducted by researchers at the Fordham Institute, which compared then-current state standards to the CCSS in terms of rigor. The findings generally show that the CCSS are as rigorous or more rigorous than state standards.
Appendix 1.4.1 is the Executive Memo from the Governor of Oregon regarding parent opt-out expectations.
Appendix 1.4.2 is the test administration manual (TAM) for all assessments in the Oregon statewide assessment system, including the SBA, OAKS, the ORExt, the Kindergarten Assessment, and the ELPA. The TAM elaborates all relevant test security and administration procedures.
Appendix 1.4A.1 is ODE’s English Learner Program Guide, outlining English learner (EL) system requirements in the areas of student identification, services, reporting, and assessment for ELs in Oregon’s public schools, including ELs who are SWD.
Appendix 1.4A.2 is Oregon’s regulations that require ODE to provide translated OAKS assessments for populations at or above 9% in grades K-12 within three years after the school year in which the language exceeds the threshold.
Appendix 1.5 is Oregon’s annual report to the state legislature for the 2015-16 school year. The report includes student demographics and information on student groups, school funding and staff information, test results, graduation and drop out rates, charter school data and information on alternative education programs, early childhood data, and attendance and chronic absenteeism data.
Appendix 2.1 is the test specifications document that describes our approach to assessment and test design for the ORExt. The document includes our approach to RDBC, an overview of the essentialization process and EAF documents, the anticipated operational test design for the ORExt, test development considerations, sample test items, item specifications, and universal tools/designated supports/accommodations.
Appendix 2.1A provides the field with comprehensive information related to scaled score interpretation for the ORExt. The guidance is published in three main areas: 1) Annual performance, 2) Annual growth, and 3) Performance for very low functioning students. Guidance regarding use and interpretation of reading and writing subscores is also provided.
Appendix 2.1B is the test blueprint for the ORExt, conveying the balance of representation of domains across the content areas and grade levels assessed. Operational items are selected to reflect the representation percentages included in the test blueprint.
Appendix 2.1C describes the eight-step item development process used to develop items for the ORExt, from standard selection to test booklet formation. The item development process is specific and explicit in order to increase transparency.
Appendix 2.2.1 is the set of PPT slides that were used to train item writers for the ORExt. Item writers were also provided an orientation to the test specifications as part of training.
Appendix 2.2.2 is a document that summarizes the balanced design vertical scaling plan employed for the ORExt in the 2014-15 administration. The document includes the domain sampling plan for all assessments, as well as the decision rules employed to remove items from the operational item pool prior to vertical scaling and standard setting procedures.
Appendix 2.2.3 provides stakeholders with visual representation of the structure of the ORExt. Sample items are conveyed in English language arts, mathematics, and science, with the scoring protocol and student materials presented together. Stakeholders can see the structure of each item, as well as how the items are scored. They can also gather an idea about the types of formats that are used for answer choices that are included within the student materials documents.
Appendix 2.3 is ODE’s General Administration and Scoring Manual for 2016-17. The manual establishes ODE’s expectations regarding the test window, utilizing the ORExt training and proficiency website, using the sign language interpreter training and proficiency website, and informing parents. It also provides the following information for stakeholders, including educators and parents: " Overview of the Extended Assessments " Assessing a Student " Scoring " Decision Making " Information for Teachers The manual provides three appendices that provide guidance regarding the provision of supports, parent questions and answers, and a glossary.
Appendix 2.3A.1 is the 2016-17 accessibility options manual for all assessments in the Oregon statewide assessment system, including the SBA, OAKS, the ORExt, and the ELPA. Options include Universal Tools, Designated Supports, and Accommodations. The manual provides guidance regarding use of these options in instruction and assessment, as well as implementation strategies and use evaluation. Each accommodation is coded for use in data analysis related to assessment scores for the SBA and OAKS.
Appendix 2.3A.2 is ODE’s How to Select, Administer, and Evaluate Accommodations on Oregon’s Statewide Assessment manual for 2013-14. The manual trains users regarding how to implement and evaluate appropriate accommodations, from the student level to the systems level.
Appendix 2.3A.3 is a document that summarizes the procedures used during item development to reduce item depth, breadth, and complexity, in addition to the test specifications information found in Appendix 2.1. The document also provides more detail regarding how language complexity is addressed and reviewed in an effort to decrease the language load of items and make the test more accessible to all students. The document also discusses ways in which bias is addressed during test development.
Appendices 2.3B.1 and 2.3B.2 are the PowerPoint (PPT) trainings that were used by ODE and BRT trainers to train new qualified assessors (QAs) and qualified trainers (QTs) in four regionally hosted trainings in November 2016. QTs also used the package to train New Qualified Assessors for the 2016-17 school year. The training provides participants with the information needed to pass proficiency tests as part of the requirements to become a QA for the Oregon Extended Assessments and was delivered by QTs throughout the state. The training package addresses the following topics: “What’s new in 2016-17?”, “2017 Test Window”, “Eligibility - which students take AA-AAAS?”, “Test administration”, “Student Confidentiality & Test Security”, “Test Administration (Physical & Logistic)”, “Scoring & Data Entry”, “Reports & Sharing Results with Parents”, “Navigating the Training and Proficiency website”, and “Resources.”
Appendix 2.3B.4 is the test calendar for the entire Oregon statewide assessment program, including the SBA, OAKS, the ORExt, the ELPA, the Kindergarten Assessment, and the NAEP.
Appendix 2.3B.5 is a sample agenda that ODE makes available to QTs around the state to train their respective new QAs as they implement the train-the-trainers model used by the Oregon Extended assessment.
Appendix 2.3B.6 is the list of instructions provided to new QAs and QTs regarding how to access the online training and proficiency website.
Appendix 2.3B.7 is the list of responsibilities associated with being a QT for the ORExt assessment.
Appendix 2.3B.8 is the document that contains the most commonly fielded questions and answers from stakeholders, including parents and teachers.
Appendix 2.3B.9 is the report that summarizes all of the technical assistance questions garnered from the field this year. Efforts are made to find any patterns that our team may use to improve training for the following year.
Appendix 2.3B.10 is the consequential validity report for the spring 2017 consequential validity study conducted by BRT. The report provides documentation of the perceptions in the field related to both intended and unintended academic and social consequences of the ORExt.
Appendix 2.3C is the ORExt Pilot Tablet Administration report for the spring 2017 tablet administration, Phase 2, study conducted by BRT. The report provides the research plan, summaries of results, and lessons learned regarding how to approach statewide operational tablet administration planned for next year.
Appendix 2.6C is the manual defining the state of Oregon’s policies and procedures regarding how students are included in AMO reporting, including how achievement, growth, and graduation rates are reported for student groups and subgroups.
Appendix 3.1A is a document that summarizes the independent alignment study process and participants used to review the linkage between the Essentialized Standards and grade level content standards (CCSS in ELA and Math; ORSci and NGSS in Science), as well as the alignment between test items for the ORExt with those Essentialized Standards. In addition, reviewers rated the items for potential bias and access concerns. All data was gathered using the Distributed Item Review (DIR) website, supported by a webinar training and ongoing technical assistance. The results of the 2014-15 Linkage Study, which was not independent but run by BRT researchers, are also included.
Appendix 3.1B is a document that describes the Distributed Item Review (DIR) website used by Oregon teachers to evaluate the alignment between test items for the ORExt with Essentialized Standards. In addition, reviewers rated the items for potential bias and access concerns. All data was gathered using the DIR website, supported by a webinar training and ongoing technical assistance.
Appendix 4.1B conveys the historical development of the ORExt from 1999 to the present, including the grade levels/bands assessed, content areas assessed, and the targeted content standards.
Appendix 4.2 includes the most current published state level data regarding Oregon’s ethnic diversity.
Appendix 5.1B is the revised and rigorous guidance that ODE has provided to IEP teams to assist them in making appropriate assessment eligibility determinations for students with disabilities.
Appendix 5.1D includes a summary report of the statewide results and the administration and scoring instructions for the new Oregon Observational Rating Assessment (ORora). The ORora is administered to all students whose ORExt testing was discontinued. It provides information regarding student progress in terms of functional skills in adaptive and communication domains for the small subgroup of students who are unable to meet the academic expectations in the ORExt.
Appendix 6.1A.1 is the agenda and minutes that document the hearing and adoption of the AAAS for the ORExt on June 25, 2015.
Appendix 6.1A.2 includes all of the achievement level descriptors (ALDs) and cutscores that define performance for the ORExt in qualitative and quantitative fashions, respectively. These Alternate Academic Achievement Standards (AAAS) describe what students should know and be able to do based upon their performance on the ORExt.
Appendix 6.2.1 is the PPT slides used to train standard setters during the June 2015 standard setting meetings for ELA, math, and science.
Appendix 6.2.2 is a standard setting report generated by an independent auditor. The report provides a comprehensive evaluation of the bookmark standard setting procedure employed for the ORExt on June 15-17, 2015.
Appendix 6.4C is a document that displays the individual student report (ISR) that ODE publishes for students who participate in the ORExt. The mock-up includes cut scores and achievement level descriptors (ALDs), as well as links to the ODE website for additional information.